基于模型的离线零和马尔可夫博弈强化学习

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Operations Research · 2024

被引 5

人大 AFT50UTD24ABS 4*

Gen Li · 香港中文大学
Yuling Yan · 麻省理工学院
Yuxin Chen · 宾夕法尼亚大学
Jianqing Fan · 普林斯顿大学

中文导读

提出一种基于模型的算法，从离线数据中学习两人零和马尔可夫博弈的纳什均衡，样本复杂度仅与个体动作数线性相关，并证明了其极小化最优性。

Abstract

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Despite a large number of prior works tackling this problem, the state-of-the-art results suffer from the curse of multiple agents in the sense that their sample complexity bounds scale linearly with the total number of joint actions. The current paper proposes a new model-based algorithm, which provably finds an approximate Nash equilibrium with a sample complexity that scales linearly with the total number of individual actions. This work also develops a matching minimax lower bound, demonstrating the minimax optimality of the proposed algorithm for a broad regime of interest. An appealing feature of the result lies in algorithmic simplicity, which reveals the unnecessity of sophisticated variance reduction and sample splitting in achieving sample optimality.

强化学习博弈论马尔可夫决策过程离线学习样本复杂度

阅读原文 ↗