基于强化学习的模糊马尔可夫跳变系统零和博弈：单环策略迭代方法

Reinforcement Learning-Based Zero-Sum Games for Fuzzy Markov Jump Systems via Single-Loop Policy Iteration

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2026

被引 0

ABS 3

Hao Shen
Yun Wang
Yi-Xiang Wang
Ju H. Park

中文导读

针对非线性马尔可夫跳变系统的零和博弈问题，提出一种基于强化学习的单环迭代算法，相比传统双环方法计算效率更高，且初始条件更宽松，并通过单连杆机器人臂模型验证了有效性。

Abstract

In this article, a reinforcement learning-based single-loop iteration scheme is developed to address the zero-sum game problem for nonlinear Markov jump systems (MJSs), where the Takagi–Sugeno fuzzy model is employed to describe the nonlinear dynamics. By resorting to game theory, the zero-sum game problem can be reformulated as solving the game algebraic Riccati equation (GARE). Compared with existing double-loop iteration methods, the proposed single-loop iterative scheme is more computationally efficient. Specifically, a parallel single-loop iterative method relying on system dynamics is first proposed to solve the GARE of MJSs under a milder initial condition. Then, a data-driven parallel algorithm is further developed to solve the GARE by using data collected along system trajectories instead of explicit system models. Moreover, compared with traditional Newton-based single-loop schemes, the proposed method has a more relaxed initialization condition. Rigorous convergence analyses are provided for both the proposed model-based and data-driven algorithms. Finally, a single-link robot arm model is employed to validate the effectiveness of the developed method.

强化学习模糊控制马尔可夫跳变系统零和博弈非线性系统

阅读原文 ↗