Learning to Construct a Solution for the Agile Satellite Scheduling Problem With Time-Dependent Transition Times
针对带时间依赖转换时间的敏捷地球观测卫星调度问题,提出一种深度强化学习构造模型,包含马尔可夫决策过程、特征工程、构造启发式神经网络等五个部分,实验表明在优化速度和质量上优于现有算法。
The agile earth observation satellite scheduling problem (AEOSSP) with time-dependent transition times is a complex combinational optimization problem that has emerged from the development of large-scale satellite management techniques. To address this problem, we propose a deep reinforcement learning-based construction model (DRL-CM) that consists of five parts: 1) a Markov decision process (MDP); 2) a feature engineering; 3) a constructive heuristic neural network (CHNN); 4) an RL training method; and 5) an evaluation system. Specifically, the CHNN comprises six modules containing three special components that we propose: a dynamic encoder, a dynamic global layer, and a two-stage attention layer. First, we build the MDP of the AEOSSP and the feature engineering with effective features required for decision-making. Second, we design the CHNN to function as the MDP policy and train it with an RL model. Finally, we propose a comprehensive evaluation system for the validation of our model. The experimental results indicate that the proposed DRL-CM outperforms the state-of-the-art algorithm in terms of both optimization speed and quality. In addition, the feature engineering and network architecture built in our model are verified to be effective in comprehensive experiments.