🌙

揭示多变量离群点和杠杆点

Unmasking Multivariate Outliers and Leverage Points

Journal of the American Statistical Association · 1990
被引 461 · 同刊同年前 6%
ABS 4

中文导读

针对多变量数据中离群点难以识别的问题,提出用稳健估计替代传统均值-协方差方法,并设计残差-距离图将数据分为正常点、垂直离群点、好杠杆点和坏杠杆点。

Abstract

Abstract Detecting outliers in a multivariate point cloud is not trivial, especially when there are several outliers. The classical identification method does not always find them, because it is based on the sample mean and covariance matrix, which are themselves affected by the outliers. That is how the outliers get masked. To avoid the masking effect, we propose to compute distances based on very robust estimates of location and covariance. These robust distances are better suited to expose the outliers. In the case of regression data, the classical least squares approach masks outliers in a similar way. Also here, the outliers may be unmasked by using a highly robust regression method. Finally, a new display is proposed in which the robust regression residuals are plotted versus the robust distances. This plot classifies the data into regular observations, vertical outliers, good leverage points, and bad leverage points. Several examples are discussed.

统计学数据挖掘回归分析稳健统计