Adaptive Discretization in Online Reinforcement Learning
提出一种数据驱动的自适应离散化框架,用于非参数强化学习,相比均匀离散化和核回归方法,在样本、存储和计算复杂度上都有可证明的改进,并针对特定实例达到最优性能。
Adaptive Discretization in Reinforcement Learning Performance guarantees for RL algorithms are typically for worst case instances, which are pathological by design and not observed in meaningful applications. Moreover, many domains (such as computer systems and networking applications) have large state-action spaces and require algorithms to execute with low latency. This phenomenon highlights a trifecta of goals for practical RL algorithms: low sample, storage, and computational complexity. In this work, we develop an algorithmic framework for nonparametric RL with data-driven adaptive discretization. Our framework has provably better sample, storage, and computational complexity than uniform discretization or kernel regression methods. Moreover, we highlight how the performance guarantees are min-max optimal with respect to a novel instance-specific complexity measure that captures structure in facility location and newsvendor models.