在线强化学习中的自适应离散化

Adaptive Discretization in Online Reinforcement Learning

Operations Research · 2022

被引 12

人大 AFT50UTD24ABS 4*

Sean R. Sinclair · 康奈尔大学
Siddhartha Banerjee · 康奈尔大学
Christina Lee Yu · 康奈尔大学

中文导读

提出一种数据驱动的自适应离散化框架，用于非参数强化学习，相比均匀离散化和核回归方法，在样本、存储和计算复杂度上都有可证明的改进，并针对特定实例达到最优性能。

Abstract

Adaptive Discretization in Reinforcement Learning Performance guarantees for RL algorithms are typically for worst case instances, which are pathological by design and not observed in meaningful applications. Moreover, many domains (such as computer systems and networking applications) have large state-action spaces and require algorithms to execute with low latency. This phenomenon highlights a trifecta of goals for practical RL algorithms: low sample, storage, and computational complexity. In this work, we develop an algorithmic framework for nonparametric RL with data-driven adaptive discretization. Our framework has provably better sample, storage, and computational complexity than uniform discretization or kernel regression methods. Moreover, we highlight how the performance guarantees are min-max optimal with respect to a novel instance-specific complexity measure that captures structure in facility location and newsvendor models.

强化学习自适应离散化样本复杂度计算复杂度非参数方法

免费全文 ↗阅读原文 ↗