Model-Free Algorithms for Cooperative Output Regulation of Discrete-Time Multiagent Systems via Q -Learning Method
针对系统参数未知的离散时间多智能体系统,提出一种无模型Q学习算法,无需系统参数和求解调节器方程即可直接获得最优策略,并解决了初始策略不稳定时的稳定增益计算问题。
This article addresses the cooperative output regulation problem for discrete-time multiagent systems with unknown parameters, a challenge that arises in many practical applications where system models are unavailable. Unlike existing techniques, a model-free Q-learning algorithm is devised to iteratively obtain the optimal policy. This algorithm operates independently of system parameters, and its immediate cost formulation excludes the necessity of solving regulator equations. Consequently, it achieves a streamlined structure, facilitating direct determination of the optimal policy. Subsequently, the stability of each iteration of the algorithm is formally established, along with the derivation of a unique condition for the Q-function matrix. Additionally, to address the challenge of obtaining a stable policy when the initial policy is unstable, an innovative data-driven algorithm is introduced that effectively computes the initial stable gains, ensuring convergence to stability throughout the learning process. Meanwhile, we focus on demonstrating that the distributed observer and the excitation noise do not introduce bias. Finally, the efficacy of the proposed algorithm is validated through two simulation examples.