Evolutionary Reinforcement Learning With Late-Start Evolution and Clustering Archive
提出一种延迟启动聚类进化强化学习算法,通过延迟启动策略、双对立近端变异算子和聚类选择方法,改善个体质量与多样性,在MuJoCo基准和能源管理问题上表现更优。
Evolutionary Reinforcement Learning (ERL) is a new learning paradigm that integrates Evolutionary Algorithm (EA) with Reinforcement Learning (RL). Existing ERL methods encounter a problem of poor balance between individual quality and diversity, which causes experience mismatch where delayed experiences generated by the population hinder the training of the RL agent. To address this problem, we propose a Late-start Clustering Evolutionary Reinforcement Learning (LCERL) algorithm to improve individual quality and diversity, thereby enhancing the synergy between the population and the RL agent. First, a late-start strategy is proposed to avoid the detrimental impact of poor experiences generated by the population on the RL agent’s training in the early stage. Second, a double opposite proximal mutation operator is designed and applied to the RL agent to generate high-quality individuals that are comparable to the RL agent. Third, a clustering selection method with an archive is designed to select diverse individuals for experience generation. Experimental results on the MuJoCo benchmark and a real-world energy management problem demonstrate the superior performance and practicability of LCERL.