基于代理辅助的进化Q学习用于黑箱动态时间关联优化问题

Surrogate-Assisted Evolutionary Q-Learning for Black-Box Dynamic Time-Linkage Optimization Problems

IEEE Transactions on Evolutionary Computation · 2022

被引 19

ABS 4

Tuo Zhang
Handing Wang
Bo Yuan
Xin Yao
Yaochu Jin

中文导读

针对连续黑箱动态时间关联优化问题，提出一种结合代理模型和Q学习的进化算法，通过状态提取与预测处理时间关联性，在多个基准问题上表现优于对比算法。

Abstract

Dynamic time-linkage optimization problems (DTPs) are special dynamic optimization problems (DOPs) with the time-linkage property. The environment of DTPs changes not only over time but also depends on the previous applied solutions. DTPs are hardly solved by existing dynamic evolutionary algorithms because they ignore the time-linkage property. In fact, they can be viewed as multiple decision-making problems and solved by reinforcement learning (RL). However, only some discrete DTPs are solved by RL-based evolutionary optimization algorithms with the assumption of observable objective functions. In this work, we propose a dynamic evolutionary optimization algorithm using surrogate-assisted <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -learning for continuous black-box DTPs. To observe the states of black-box DTPs, the state extraction and prediction methods are applied after the search process at each time step. Based on the learned information, a surrogate-assisted <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -learning is introduced to evaluate and select candidate solutions in the continuous decision space in a long-term consideration. We evaluate the components of our proposed algorithm on various benchmark problems to study their behaviors. Results of comparative experiments indicate that the proposed algorithm outperforms other compared algorithms and performs robustly on DTPs with up to 30 decision variables and different dynamic changes.

动态优化进化算法强化学习代理模型黑箱优化

阅读原文 ↗