RCM：一种带有重构机制的神经策略模型用于构建敏捷卫星调度问题的解

RCM: A Neural Policy Model With Reconstruction Mechanism to Construct a Solution for the Agile Satellite Scheduling Problem

IEEE Transactions on Cybernetics · 2025

被引 8

ABS 3

Ming Chen
Jie Chun
Yong‐Ming He
Xiaolu Liu
Guohua Wu
Witold Pedrycz

中文导读

针对敏捷对地观测卫星调度问题，提出一种基于深度强化学习的重构模型，先由策略模型构造初始解，再通过修复和移除操作修正决策错误，在0.1秒内达到优于现有迭代搜索方法的性能。

Abstract

The agile Earth observation satellite scheduling problem (AEOSSP) with time-dependent transition time is a combinatorial optimization challenge. Due to its NP-hardness, problem-tailored methods are sensitive to instances and require massive computational overhead. Recently, deep reinforcement learning (DRL) models have shown promise in efficiently addressing the AEOSSP. However, these models may make decision mistakes in specific scenarios due to prioritizing maximizing average reward expectation over individual decision accuracy during DRL training, directly leading to resource wastage. To address these issues, we propose a reconstruction model (RCM), which is a DRL-based two-stage construction model (CM), including a CM and a reconstruction mechanism (RM). RCM constructs solutions initially using a DRL-trained CM, which are subsequently refined by RM. CM utilizes a more efficient network for policy representation to make decisions. RM applies two operators, "repair" and "removal," with a "repair-removal-repair" solution reconstruction process to identify and rectify decision mistakes from CM, offering a modular component to enhance the stability and solution quality. Experimental results demonstrate that the proposed RCM outperforms the state-of-the-art AEOSSP iterative search method, achieving such performance within a computational time of 0.1 s. Additionally, CM surpasses the state-of-the-art DRL policy model and RM can effectively rectify decision errors or suboptimalities, underscoring its effectiveness in enhancing DRL outcomes.

卫星调度深度强化学习组合优化运筹学

阅读原文 ↗