From Offline to Online: Pretrained Dynamic Optimization with Multi-Agent Reinforcement Learning
提出预训练动态优化框架PreDOF,通过离线预训练和在线微调两阶段,利用多智能体强化学习提升在线动态优化场景的收敛性能,在基准测试和实际案例中均优于现有方法。
Dynamic optimization enables efficient decision-making in complex, time-varying, and uncertain environments, providing a wide range of application opportunities. However, the existing methods mainly focus on offline dynamic optimization problems. When faced with online optimization scenarios characterized by stronger dynamics and frequent changes that impose stricter time constraints on adaptation, their convergence performance decreases significantly. To fill this gap, we propose a pretrained dynamic optimization framework (PreDOF) for online scenarios. This framework introduces a novel learning paradigm, Evolutionary Markov Decision Process (EMDP), which is designed to harness multi-agents’ real-time decision-making ability effectively. PreDOF has two stages: 1) Offline Pre-training: multi-agent reinforcement learning is employed on large-scale evolutionary data to capture environmental change patterns. The agents are trained in each dynamic environment to guide the evolution directions; 2) Online Fine-tuning: an online agent is adjusted with the offline experience to obtain real-time optimization outcomes. The online agent is trained to join the offline multi-agent via a Weighted Joint Decision strategy, thereby accelerating online convergence and response to changes promptly. Experimental results on benchmarks demonstrate that the proposed framework outperforms the competing methods in terms of both effectiveness and convergence. Furthermore, the real-world case study also reveals the superior performance of our proposed framework in highly constrained and challenging dynamic optimization scenarios.