动态博弈中的协作有限激励学习

Cooperative Finitely Excited Learning for Dynamical Games

IEEE Transactions on Cybernetics · 2023

被引 127 · 同刊同年前 2%

ABS 3

Yongliang Yang
Hamidreza Modares
Kyriakos G. Vamvoudakis
Frank L. Lewis

中文导读

针对连续时间零和动态博弈，提出一种协作有限激励学习方法，用在线数据和历史数据结合替代传统持续激励条件，保证系统稳定和纳什均衡收敛。

Abstract

In this article, we propose a way to enhance the learning framework for zero-sum games with dynamics evolving in continuous time. In contrast to the conventional centralized actor-critic learning, a novel cooperative finitely excited learning approach is developed to combine the online recorded data with instantaneous data for efficiency. By using an experience replay technique for each agent and distributed interaction amongst agents, we are able to replace the classical persistent excitation condition with an easy-to-check cooperative excitation condition. This approach also guarantees the consensus of the distributed actor-critic learning on the solution to the Hamilton-Jacobi-Isaacs (HJI) equation. It is shown that both the closed-loop stability of the equilibrium point and convergence to the Nash equilibrium can be guaranteed. Simulation results demonstrate the efficacy of this approach compared to previous methods.

博弈论机器学习控制理论人工智能

阅读原文 ↗