如何处理分类缺失数据:一种简单贝叶斯方法的检验

How to Deal with Missing Categorical Data: Test of a Simple Bayesian Method

ORGANIZATIONAL RESEARCH METHODS · 2003
被引 69
人大 A-ABS 4

中文导读

比较了六种处理分类变量缺失数据的方法,发现列表删除法高效,若数据损失严重则推荐贝叶斯方法,回归插补虽有效但依赖数据结构且存在额外问题。

Abstract

The authors analyze the efficiency of six missing data techniques for categorical item nonresponse under the assumption that data are missing at random or missing completely at random. By efficiency, the authors mean a procedure that produces an unbiased estimate of true sample properties that is also easy to implement. The investigated techniques include listwise deletion, mode substitution, random imputation, two regression imputations, and a Bayesian model-based procedure. The authors analyze efficiency under six experimental conditions for a survey-based data set. They find that listwise deletion is efficient for the data analyzed. If data loss due to listwise deletion is an issue, the analysis points to the Bayesian method. Regression imputation is also efficient, but the result is conditioned on the specific data structure and may not hold in general. Additional problems arise when using regression imputation, making it less appropriate.

缺失数据处理分类变量贝叶斯方法计量经济学数据挖掘