强化学习中的平稳性检验与变点检测

Testing stationarity and change point detection in reinforcement learning

Annals of Statistics · 2025

被引 2

ABS 4*

Mengbing Li
Chengchun Shi
Zhenke Wu
Piotr Fryźlewicz

中文导读

针对强化学习在非平稳环境中的应用，提出一种基于历史数据的模型无关检验方法，用于评估最优Q函数的平稳性，并开发变点检测方法以适配现有算法，通过理论、仿真和真实数据验证有效性。

Abstract

We consider reinforcement learning (RL) in possibly nonstationary environments. Many existing RL algorithms in the literature rely on the stationarity assumption that requires the state transition and reward functions to be constant over time. However, this assumption is restrictive in practice and is likely to be violated in a number of applications, including traffic signal control, robotics and mobile health. In this paper, we develop a model-free test to assess the stationarity of the optimal Q-function based on pre-collected historical data, without additional online data collection. Based on the proposed test, we further develop a change point detection method that can be naturally coupled with existing state-of-the-art RL methods designed in stationary environments for online policy optimization in nonstationary environments. The usefulness of our method is illustrated by theoretical results, simulation studies, and a real data example from the 2018 Intern Health Study. A Python implementation of the proposed procedure is publicly available at https://github.com/limengbinggz/CUSUM-RL.

强化学习非平稳环境变点检测统计检验

阅读原文 ↗