Statistical Tests for Replacing Human Decision Makers with Algorithms
提出一个统计框架,用人工智能改进人类决策,将部分医生的诊断替换为机器学习算法的推荐,并在全国孕前检查数据中验证,发现算法比纯医生诊断有更高的真阳性率和更低的假阳性率。
This paper proposes a statistical framework of using artificial intelligence to improve human decision making. The performance of each human decision maker is benchmarked against that of machine predictions. We replace the diagnoses made by a subset of the decision makers with the recommendation from the machine learning algorithm. We apply both a heuristic frequentist approach and a Bayesian posterior loss function approach to abnormal birth detection using a nationwide data set of doctor diagnoses from prepregnancy checkups of reproductive-age couples and pregnancy outcomes. We find that our algorithm on a test data set results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. This paper was accepted by Yan Chen, behavioral economics and decision analysis. Funding: H. Hong’s work was supported by the National Science Foundation [Grant SES 1658950]. K. Tang’s work was supported by the National Natural Science Foundation of China [Grants 72192802, 72342008]. J. Wang’s work was partially supported by the National Natural Science Foundation of China [Grants 72222022, 72171013, 72242101]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.01845 .