n人马尔可夫博弈的经验策略优化

Empirical Policy Optimization for n-Player Markov Games

IEEE Transactions on Cybernetics · 2022

被引 20

ABS 3

Yuanheng Zhu
Weifan Li
Mengchen Zhao
Jianye Hao
Dongbin Zhao

中文导读

提出一种新的学习方案，通过聚合历史表现来演化策略，使多玩家马尔可夫博弈中的玩家能收敛到近似纳什均衡，并开发了分布式强化学习算法。

Abstract

In single-agent Markov decision processes, an agent can optimize its policy based on the interaction with the environment. In multiplayer Markov games (MGs), however, the interaction is nonstationary due to the behaviors of other players, so the agent has no fixed optimization objective. The challenge becomes finding equilibrium policies for all players. In this research, we treat the evolution of player policies as a dynamical process and propose a novel learning scheme for Nash equilibrium. The core is to evolve one's policy according to not just its current in-game performance, but an aggregation of its performance over history. We show that for a variety of MGs, players in our learning scheme will provably converge to a point that is an approximation to Nash equilibrium. Combined with neural networks, we develop an empirical policy optimization algorithm, which is implemented in a reinforcement-learning framework and runs in a distributed way, with each player optimizing its policy based on own observations. We use two numerical examples to validate the convergence property on small-scale MGs, and a pong example to show the potential on large games.

强化学习马尔可夫博弈纳什均衡机器学习人工智能

阅读原文 ↗