后强化学习推断

Post Reinforcement Learning Inference

Operations Research · 2025

被引 0

人大 AFT50UTD24ABS 4*

Vasilis Syrgkanis · 斯坦福大学
Ruohan Zhan · 伦敦大学学院

中文导读

针对强化学习算法生成的自适应数据，提出一种自适应加权广义矩估计方法，实现策略价值和动态处理效应的有效推断，适用于动态离策略评估和个性化决策系统。

Abstract

In “Post Reinforcement Learning Inference,” Vasilis Syrgkanis and Ruohan Zhan develop a new inferential framework for data collected via reinforcement learning (RL) algorithms, the adaptive systems that update strategies as outcomes unfold. Traditional statistical methods fail in this setting because adaptivity induces time-varying variance and dependence across samples. Syrgkanis and Zhan propose an adaptively weighted generalized method of moments (AW-GMM) estimator that stabilizes this variance through data-dependent weights. They prove that the weighted estimator achieves consistency and asymptotic normality, enabling valid hypothesis testing and confidence intervals for policy values and dynamic treatment effects. Their method provides a unified approach for structural estimation and inference under nonstationary, adaptively generated sequence data, with applications to dynamic off-policy evaluation and personalized decision systems.

强化学习统计推断动态策略评估计量经济学

阅读原文 ↗