🌙

基于模型的离线零和马尔可夫博弈强化学习

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Operations Research · 2024
被引 5
人大 AFT50UTD24ABS 4*

中文导读

提出一种基于模型的算法,从离线数据中学习两人零和马尔可夫博弈的纳什均衡,样本复杂度仅与个体动作数线性相关,并证明了其极小化最优性。

Abstract

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Despite a large number of prior works tackling this problem, the state-of-the-art results suffer from the curse of multiple agents in the sense that their sample complexity bounds scale linearly with the total number of joint actions. The current paper proposes a new model-based algorithm, which provably finds an approximate Nash equilibrium with a sample complexity that scales linearly with the total number of individual actions. This work also develops a matching minimax lower bound, demonstrating the minimax optimality of the proposed algorithm for a broad regime of interest. An appealing feature of the result lies in algorithmic simplicity, which reveals the unnecessity of sophisticated variance reduction and sample splitting in achieving sample optimality.

强化学习博弈论马尔可夫决策过程离线学习样本复杂度