无限时域离线强化学习的统计高效优势学习

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

Journal of the American Statistical Association · 2022

被引 6

ABS 4

Yuan Le
Chengchun Shi 通讯
Shikai Luo
Hongtu Zhu
Rui Song

中文导读

针对离线强化学习场景，提出一种优势学习框架，利用预收集数据优化策略，保证策略价值收敛速度优于初始Q估计器，适用于移动健康等无法在线收集数据的应用。

Abstract

We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL

强化学习离线学习策略优化移动健康应用

阅读原文 ↗