Role-Policy Enhanced Collaborative Task Learning in Multiagent Systems
提出RPEGP算法,在演员-评论家框架中引入角色概念,通过同时训练角色策略和全局策略,提升多智能体系统在稀疏奖励和大规模环境下的协作学习效率与性能。
Current mainstream multiagent reinforcement learning (MARL) algorithms primarily focus on acquiring the global maximum reward throughout the entire training process, from the initial to the final stage. Whereas directly pursuing the global maximum return tends to be inefficient, particularly in environments with sparse rewards or the large-scale multiagent system. To address these challenges, previous algorithms have been developed to maintain individual policies to guide global training. Nevertheless, these approaches generally neglect either efficiency or the potential for local collaboration at the early stage of training. In this article, we propose the role-policy enhanced global policy (RPEGP) algorithm, which integrates the concept of distinct roles within the actor–critic-based MARL framework. RPEGP simultaneously considers both collaborative behaviors among agents and efficient global policy training. Specifically, RPEGP exploits the similarities among agents to assign distinct roles, training role-policies and the global policy concurrently. Through the initialization and enhancement of the role-policies, the global policy is trained more efficiently and effectively. Empirical experiments are conducted in well-known cooperative multiagent environments, including StarCraft II micromanagement (SMAC) and multiagent particle environment (MPE). Experimental results demonstrate that RPEGP outperforms baseline algorithms across various evaluation metrics and training efficiency, confirming its ability to address complex cooperative tasks generically and efficiently.