Multiplayer Differential Games of Markov Jump Systems via Reinforcement Learning
研究了用强化学习方法在线求解马尔可夫跳变系统的多人微分博弈问题,提出了分布式极小极大策略和异步纳什策略两种算法,并通过倒立摆系统验证了有效性。
In this article, we focus on solving the problem of online multiplayer differential games (MDGs) of Markov jump systems (MJSs) using a reinforcement learning (RL) method. We consider MDGs of MJSs from the following two scenarios. In the first scenario, we propose a distributed minmax strategy, where each player can derive their optimal control policy from distributed game algebraic Riccati equations (DGAREs) without prior knowledge of the policies adopted by other players, distinguishing it from existing RL algorithms. We design a novel online distributed RL algorithm to approximate the solution of DGAREs without completely knowing system dynamics and initial admissible control policy. The second scenario involves applying Nash strategy to address MDGs of MJSs. Different from existing synchronous RL algorithm, we propose a novel online asynchronous RL algorithm that employs asynchronous iterative calculations for both policy evaluation and policy improvement, incorporating the latest information into the iterative process. The convergence of the designed RL algorithms is rigorously analyzed. Finally, two inverted pendulum system applications validate the effectiveness of the proposed methods.