无模型Q学习用于分散非零和博弈输出反馈纳什策略

Model-Free Q-Learning for Output Feedback Nash Strategy of Decentralized Nonzero-Sum Games

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2025

被引 1

ABS 3

Qiyan Zhang
Hongxia Wang
Kai Peng
Huanshui Zhang

中文导读

提出一种无模型输出反馈Q学习算法，解决信息不对称下分散非零和博弈的纳什均衡策略问题，仅利用输入和测量信息在线学习最优控制器。

Abstract

In this article, we present a model-free output feedback (OPFB) <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Q</i>-learning algorithm to find the optimal Nash equilibrium strategy for the decentralized control problem (DCP) of nonzero-sum games with asymmetric information. The main challenge lies in different historical information available to each controller, namely, the input information is shared while the measurement information is private. To overcome this difficulty, a novel optimal Nash strategy in the input/output form is derived without measurable system states. Then, the OPFB <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Q</i>-learning iteration algorithm is developed to learn the optimal controllers online only by the knowledge of available input and measurement information, rather than the system dynamics and states. The key is solving the equilibrium equations under asymmetric information, which is achieved by reformulating them into a constrained minimization problem, yielding the numerical solution of the optimal controller pair. The presented idea is new to the best of authors’ knowledge. Numerical examples are shown to illustrate the effectiveness of the proposed algorithm.

强化学习博弈论最优控制分散控制非零和博弈

阅读原文 ↗