Cautionary Tails about Arbitrary Deletion of Observations; or, Throwing the Variance Out with the Bathwater
研究实证研究中常见的样本截断(如剔除异常值)对估计结果的影响,通过工资方程的方差成分模型展示其不一致性,并开发贝叶斯估计器来揭示研究者截断数据时隐含的先验信念。
A frequent practice in empirical work is to "preanalyze" the data via various sample inclusion rules. Truncation of "outliers" is common. These procedures are a form of sample censoring imposed by the investigator. Such censoring produces effects familiar from the sample selection literature. This paper investigates the question why an investigator might want to censor a sample and what the costs are. In an empirical example, using a variance components model of a wage equation, potential inconsistency problems are highlighted. The results indicate that while the slope coefficients, <tex-math>$\hat\beta$</tex-math>, may typically be less sensitive to censoring than the variance components, some common forms of censoring also markedly affect <tex-math>$\hat\beta$</tex-math>. Finally, a Bayesian estimator that incorporates prior information in a flexible way was developed. The usual Bayesian procedure was reversed, by using the Bayesian estimator to recover the prior beliefs that an investigator imposes by any proposed truncation of outliers. Especially in large samples, extremely dogmatic prior beliefs may be imposed when outliers are eliminated. Prior distributions of the type developed in the paper may be used by the investigator to clarify the nature of his prior beliefs revealed by a willingness to truncate data points and to assess whether or not any proposed truncation accurately reflects thoes beliefs.