Post Reinforcement Learning Inference
针对强化学习算法生成的自适应数据,提出一种自适应加权广义矩估计方法,实现策略价值和动态处理效应的有效推断,适用于动态离策略评估和个性化决策系统。
In “Post Reinforcement Learning Inference,” Vasilis Syrgkanis and Ruohan Zhan develop a new inferential framework for data collected via reinforcement learning (RL) algorithms, the adaptive systems that update strategies as outcomes unfold. Traditional statistical methods fail in this setting because adaptivity induces time-varying variance and dependence across samples. Syrgkanis and Zhan propose an adaptively weighted generalized method of moments (AW-GMM) estimator that stabilizes this variance through data-dependent weights. They prove that the weighted estimator achieves consistency and asymptotic normality, enabling valid hypothesis testing and confidence intervals for policy values and dynamic treatment effects. Their method provides a unified approach for structural estimation and inference under nonstationary, adaptively generated sequence data, with applications to dynamic off-policy evaluation and personalized decision systems.