真实的假数据:南非大型调查中数据捏造的普遍性及其影响

Genuine Fakes: The Prevalence and Implications of Data Fabrication in a Large South African Survey

World Bank Economic Review · 2015
被引 26
人大 A-ABS 3

中文导读

研究在南非纵向调查中发现约7%的样本存在数据捏造,通过重新调查获得真实数据,对比发现捏造对横截面估计影响不大,但会严重扭曲面板估计结果,且数据质量检查的收益成本比至少为24。

Abstract

How prevalent is data fabrication in household surveys? Would such fabrication substantially affect the validity of empirical analyses? We document how we identified such fabrication in South Africa's longitudinal National Income Dynamics Study, which affected about 7% of the sample. The fabrication was detected while fieldwork was still on-going, and the relevant interviews were reconducted. We thus have an observed counterfactual that allows us to measure how problematic such fabrication would have been, had it remained undetected. We compare estimates from the dataset that includes the fabricated interviews with corresponding estimates that includes the corrected data instead. We find that the fabrication would not have affected our univariate and cross-sectional estimates meaningfully, but would have led us to reach substantially different conclusions when implementing panel estimators. We estimate that the data quality investigation in this survey had a benefit-cost ratio of at least 24, and was thus easily justifiable.

数据造假家庭调查面板估计数据质量南非