美国制造业数据中的插补及其对生产率离散度的影响

Imputation in U.S. Manufacturing Data and Its Implications for Productivity Dispersion

Review of Economics and Statistics · 2017
被引 37
人大 AFT50ABS 4

中文导读

研究发现美国人口普查局对制造业数据的均值插补会低估生产率离散度,而基于分类回归树的多次插补能更真实地反映离散度。

Abstract

In the U.S. Census Bureau’s 2002 and 2007 Censuses of Manufactures, 79% and 73% of observations, respectively, have imputed data for at least one variable used to compute total factor productivity (TFP). The bureau primarily imputes for missing values using mean-imputation methods, which can reduce the underlying variance of the imputed variables. For five variables entering TFP, we show that dispersion is significantly smaller in the Census mean-imputed versus the nonimputed data. We use classification and regression trees (CART) to produce multiple imputations with observed data for similar plants. For 90% of the 473 industries in 2002 and 84% of the 471 industries in 2007, we find that TFP dispersion increases as we move from Census mean-imputed data to nonimputed data to the CART-imputed data.

数据插补全要素生产率生产率离散度制造业普查