A Fast Procedure for Outlier Diagnostics in Large Regression Problems
提出一种快速近似回归估计方法,通过最小化稳健尺度来识别多个异常值,避免掩蔽效应,适用于自变量数量大、传统算法耗时过长的场景。
We propose a procedure for computing a fast approximation to regression estimates based on the minimization of a robust scale. The procedure can be applied with a large number of independent variables where the usual algorithms require an unfeasible or extremely costly computer time. Also, it can be incorporated in any high-breakdown estimation method and may improve it with just little additional computer time. The procedure minimizes the robust scale over a set of tentative parameter vectors estimated by least squares after eliminating a set of possible outliers, which are obtained as follows. We represent each observation by the vector of changes of the least squares forecasts of the observation when each of the data points is deleted. Then we obtain the sets of possible outliers as the extreme points in the principal components of these vectors, or as the set of points with large residuals. The good performance of the procedure allows identication of multiple outliers, avoiding masking eects. We investigate the procedure’s eciency for robust estimation and power as an outlier detection tool in a large real dataset and in a simulation study. KEY WORDS: Masking; Outliers; Robust regression. 1.