部分线性模型中消除机器学习或人工智能生成回归变量的偏差

Debiasing ML- or AI-Generated Regressors in Partially Linear Models

Information Systems Research · 2026

被引 0

人大 AFT50UTD24ABS 4*

Wendao Xue · 香港中文大学
Yifan Yu · 香港大学
Jingwen Zhang · 伊利诺伊大学厄巴纳-香槟分校
Yong Tan · 华盛顿大学

中文导读

针对AI生成变量作为回归元带来的测量误差偏差，提出新估计量，仅需少量人工标注子样本即可在部分线性模型中实现无偏高效估计，适用于腾讯、亚马逊等平台的实验系统。

Abstract

Organizations increasingly use machine learning (ML) and artificial intelligence (AI), including large language models, to generate variables for regression models that inform business and policy decisions. For example, practitioners may use AI to predict review sentiment, ad aesthetics, or emotional expressions, and then estimate their causal effects on outcomes such as sales or engagement. However, because AI predictions are imperfect, directly using these AI-generated variables as regressors introduces measurement error that can systematically bias causal estimates, potentially leading to over- or underinvestment in business strategies. We develop new estimators that correct this bias in partially linear regression models, which are widely deployed in experimental systems at major platforms, including Tencent, Amazon AWS, and Microsoft. Our approach requires only a small human-annotated subsample alongside the large AI-labeled data set to achieve unbiased and efficient estimation. We demonstrate that our methods work with both traditional ML algorithms and LLM-based predictions. Our framework can be directly integrated into existing analytics and experimental systems, enabling practitioners to leverage the scalability of AI-generated data while maintaining reliable causal conclusions. This work also has implications for AI fairness, as our approach can help correct biases from any source in AI predictions.

因果推断计量经济学机器学习商业分析

阅读原文 ↗