稳健处理有限总体估计中的异常值

Outlier Robust Finite Population Estimation

Journal of the American Statistical Association · 1986

被引 24

ABS 4

Raymond L. Chambers 通讯

中文导读

针对样本中具有代表性的异常值，提出一种基于模型的稳健估计方法，用于估计有限总体总量，并通过模拟研究证明其优于传统方法。

Abstract

Abstract Outliers in sample data are a perennial problem for applied survey statisticians. Moreover, it is a problem for which traditional sample survey theory offers no real solution, beyond the sensible advice that such sample elements should not be weighted to their fullest extent in estimation. Sample outliers can be identified as of two basic types. Here we are concerned with the first type, which may conveniently be termed representative outliers. These are sample elements with values that have been correctly recorded and that cannot be assumed to be unique. That is, there is no good reason to assume there are no more similar outliers in the nonsampled part of the target population. The remaining sample outliers, which by default are termed nonrepresentative, are sample elements whose data values are incorrect or unique in some sense. Methods for dealing with these nonrepresentative outliers lie basically within the scope of survey editing and imputation theory and are, therefore, not considered in this article. The specific problem considered here is that of robust estimation of a finite population total given sample data containing representative outliers. The approach is model based, in that it assumes the existence of a popular “kernel” superpopulation model that adequately describes the behavior of nonoutliers in the target population. An outlier robust version of the best linear unbiased estimator of the population total under this kernel model is proposed in Section 2. This robust estimator can be viewed as a finite population prediction analog of the well-known M-estimator approach to robust parametric estimation in infinite populations (see Huber 1981). Some asymptotic theory for the proposed estimator is given, based on a central limit theorem for its prediction error under a “gross error” type of outlier generation mechanism. This theory indicates a trade-off between bias and variance robustness for this estimator in situations in which the outlier values are not symmetrically distributed around their kernel model expectations. The article also contains some results from a comparative empirical study of the proposed robust estimator (Sec. 3). This study indicates that the use of this estimator leads to substantial gains over both conventional design-unbiased and “standard” kernel model-based estimation strategies in a population with a significant number of outliers.

调查抽样稳健统计异常值检测总体总量估计

阅读原文 ↗