Memory-Efficient Inverse Reinforcement Learning for Multiplayer Differential Games
提出一种无需数据存储和持续激励条件的记忆高效逆强化学习算法,用于多人微分博弈中从专家演示推断未知代价函数,并设计滤波同伦算法解决初始可行控制策略获取难题。
Data-driven inverse reinforcement learning (RL) control aims to infer the unknown cost function of a learner system from expert demonstrations. The convergence of existing methods necessitates a data storage mechanism to maintain persistent excitation (PE), which consumes memory and induces delays in satisfying full-rank conditions. To address these problems, in this article, we propose a novel memory-efficient inverse RL algorithm for multiplayer differential game that eliminates the need for strict PE and data storage. We prove that Nash equilibrium solutions for the learner system can be guaranteed under a mild initial excitation condition. Besides, existing inverse RL control algorithms often rely on an initial admissible control policy (IACP), which is difficult to obtain in data-driven scenarios. We address this problem by designing a novel filter-based homotopic RL algorithm, which derives an IACP for learner systems by shifting unstable poles into a stable region. Moreover, we establish several properties of the designed algorithms, including convergence, nonuniqueness, and stability. Finally, the effectiveness of the proposed algorithms is verified by comparative studies and simulation results.