什么、为什么、怎么做：实证主义者的双重/去偏机器学习指南

What, Why, and How: An Empiricist’s Guide to Double/Debiased Machine Learning

Information Systems Research · 2025

被引 5

人大 AFT50UTD24ABS 4*

Bowen Shi · 清华大学
Xiaojie Mao · 清华大学
Bo Li · 清华大学
Mochen Yang · 明尼苏达大学

中文导读

介绍双重/去偏机器学习（DML）框架，帮助研究者处理高维数据中的效应估计，通过结合机器学习与统计推断，提供比传统回归更稳健的因果估计方法。

Abstract

We provide an introduction to double/debiased machine learning (DML), a framework that enables effect estimation when dealing with complex, high-dimensional data. In many empirical analyses, especially in fields such as information systems, researchers face difficult choices about which control variables to include and how to model their relationships with the outcome. These modeling decisions can significantly change results, leading to uncertainty about which findings are reliable. DML offers a practical solution by combining modern machine learning with rigorous statistical inference. The idea is to let flexible ML models (such as random forests or gradient boosting) capture complex relationships among control variables while still delivering reliable estimates for the key effect of interest. DML can be applied to many familiar research designs, including standard regression with controls, instrumental variables, difference in differences, and models that incorporate ML-generated features. Empirical studies and simulations show that DML is typically more robust to misspecification than traditional regression and more reliable than earlier semiparametric methods. However, DML is not automatic—it still requires sound research design and high-quality machine learning estimation. Used thoughtfully, DML provides a powerful, flexible, and statistically grounded approach for empirical research in modern data environments.

实证研究机器学习因果推断计量经济学信息系统

阅读原文 ↗