Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
提出一种基于模型的算法,从离线数据中学习两人零和马尔可夫博弈的纳什均衡,样本复杂度仅与个体动作数线性相关,并证明了其极小化最优性。
This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Despite a large number of prior works tackling this problem, the state-of-the-art results suffer from the curse of multiple agents in the sense that their sample complexity bounds scale linearly with the total number of joint actions. The current paper proposes a new model-based algorithm, which provably finds an approximate Nash equilibrium with a sample complexity that scales linearly with the total number of individual actions. This work also develops a matching minimax lower bound, demonstrating the minimax optimality of the proposed algorithm for a broad regime of interest. An appealing feature of the result lies in algorithmic simplicity, which reveals the unnecessity of sophisticated variance reduction and sample splitting in achieving sample optimality.