🌙

线性模型中多重异常值识别程序

Procedures for the Identification of Multiple Outliers in Linear Models

Journal of the American Statistical Association · 1993
被引 95
ABS 4

中文导读

针对线性模型中多重异常值难以识别的问题,提出两种新检验程序,能有效分离干净数据与潜在异常值,无需预设异常值数量或依赖蒙特卡洛模拟,优于现有方法。

Abstract

Abstract We consider the problem of identifying and testing multiple outliers in linear models. The available outlier identification methods often do not succeed in detecting multiple outliers because they are affected by the observations they are supposed to identify. We introduce two test procedures for the detection of multiple outliers that appear to be less sensitive to this problem. Both procedures attempt to separate the data into a set of “clean” data points and a set of points that contain the potential outliers. The potential outliers are then tested to see how extreme they are relative to the clean subset, using an appropriately scaled version of the prediction error. The procedures are illustrated and compared to various existing methods, using several data sets known to contain multiple outliers. Also, the performances of both procedures are investigated by a Monte Carlo study. The data sets and the Monte Carlo indicate that both procedures are effective in the detection of multiple outliers in linear models and are superior to other methods, including methods based on robust fits (e.g., least median of squares residuals). In particular, the methods do not require presetting numbers of outliers to test for, do not require the efficiency level of an estimator, do not require Monte Carlo to determine cutoff values, are not highly computationally intensive, and are relatively resistant to both masking and swamping effects.

计量经济学统计学线性模型异常值检测