Reinforcement Learning-Based Zero-Sum Games for Fuzzy Markov Jump Systems via Single-Loop Policy Iteration
针对非线性马尔可夫跳变系统的零和博弈问题,提出一种基于强化学习的单环迭代算法,相比传统双环方法计算效率更高,且初始条件更宽松,并通过单连杆机器人臂模型验证了有效性。
In this article, a reinforcement learning-based single-loop iteration scheme is developed to address the zero-sum game problem for nonlinear Markov jump systems (MJSs), where the Takagi–Sugeno fuzzy model is employed to describe the nonlinear dynamics. By resorting to game theory, the zero-sum game problem can be reformulated as solving the game algebraic Riccati equation (GARE). Compared with existing double-loop iteration methods, the proposed single-loop iterative scheme is more computationally efficient. Specifically, a parallel single-loop iterative method relying on system dynamics is first proposed to solve the GARE of MJSs under a milder initial condition. Then, a data-driven parallel algorithm is further developed to solve the GARE by using data collected along system trajectories instead of explicit system models. Moreover, compared with traditional Newton-based single-loop schemes, the proposed method has a more relaxed initialization condition. Rigorous convergence analyses are provided for both the proposed model-based and data-driven algorithms. Finally, a single-link robot arm model is employed to validate the effectiveness of the developed method.