Machine learning and the optimization of prediction-based policies
提出一种通过连接分类模型预测误差与社会福利来优化公共政策的方法,并应用于意大利自雇人士的逃税预测,发现随机森林模型能显著增加税收收入并揭示逃税现象的新特征。
We present a procedure for the optimal implementation of public policies that involve predicting an individual behavior or characteristic. By linking prediction errors of any given classification model to the resulting social welfare, we provide a simple measure to rank different models and select the optimal one. Such measure is defined as the difference between the social welfare of a given policy and that of an error-free policy, and it is related to the ROC curve employed in the Machine Learning literature. We extend the cost isometrics approach described in the literature by considering the case of heterogeneous costs of type I and II errors. We apply our approach to the prediction of inaccurate tax returns issued by Italian self-employed and sole proprietorships. We show that the approach can result in substantial increases in revenues, and that random forest models, beyond providing comparatively good predictions, yield important insights. In our case, they both provide empirical support for existing theories on tax evasion — highlighting, for instance, cross-sectoral heterogeneity — and extend our understanding of the phenomenon — such as the role of bunching.