🌙

马尔可夫随机逼近有限样本保证的李雅普诺夫理论

A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation

Operations Research · 2023
被引 7
人大 AFT50UTD24ABS 4*

中文导读

Chen等人提出了一个统一的李雅普诺夫框架,用于分析马尔可夫随机逼近算法的有限样本性能,并通过时序差分学习和Q-learning等强化学习算法展示了其有效性。

Abstract

The stochastic approximation (SA) method stands as the foundational mathematical tool for modern large-scale optimization and machine learning. Therefore, gaining a theoretical understanding of SA algorithms is of fundamental interest. In their paper titled “A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation,” Chen et al. present a unified Lyapunov framework for the finite-sample analysis of a Markovian SA algorithm under a contractive operator with respect to an arbitrary norm. The key novelty lies in the construction of a smooth Lyapunov function called the generalized Moreau envelope. The authors demonstrate the effectiveness of their SA results in the context of reinforcement learning (RL), specifically through popular algorithms such as variants of temporal difference (TD) learning and Q-learning. As byproducts, the results provide theoretical insights into the efficiency of bootstrapping in TD learning with eligibility traces and the bias-variance tradeoff in off-policy learning.

随机逼近强化学习李雅普诺夫函数机器学习理论