How to Deal with Missing Categorical Data: Test of a Simple Bayesian Method
比较了六种处理分类变量缺失数据的方法,发现列表删除法高效,若数据损失严重则推荐贝叶斯方法,回归插补虽有效但依赖数据结构且存在额外问题。
The authors analyze the efficiency of six missing data techniques for categorical item nonresponse under the assumption that data are missing at random or missing completely at random. By efficiency, the authors mean a procedure that produces an unbiased estimate of true sample properties that is also easy to implement. The investigated techniques include listwise deletion, mode substitution, random imputation, two regression imputations, and a Bayesian model-based procedure. The authors analyze efficiency under six experimental conditions for a survey-based data set. They find that listwise deletion is efficient for the data analyzed. If data loss due to listwise deletion is an issue, the analysis points to the Bayesian method. Regression imputation is also efficient, but the result is conditioned on the specific data structure and may not hold in general. Additional problems arise when using regression imputation, making it less appropriate.