强化学习中的不确定性量化与探索

Uncertainty Quantification and Exploration for Reinforcement Learning

Operations Research · 2023

被引 7

人大 AFT50UTD24ABS 4*

Yi Zhu · 西北大学（美国）
Jing Dong
Henry Lam · 哥伦比亚大学

中文导读

研究了强化学习中状态-动作值函数和最优值函数估计的大样本渐近分布，用于量化决策的不确定性，并基于此开发了一种纯探索策略，通过最大化估计Q值的相对差异来收集信息性数据，提升学习最优策略的概率。

Abstract

Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.

强化学习不确定性量化统计推断探索策略

阅读原文 ↗