A novel hybrid intelligent scheduling: integrating human feedback into reinforcement learning for adaptive preference objectives
提出一种将人类反馈融入强化学习的混合智能调度方法,用于航空航天部件制造中动态变化的偏好目标,实验证明优于现有方法。
Reinforcement learning (RL) is an efficient method for addressing scheduling problems with good real-time performance. However, scheduling in aerospace component manufacturing (ACM) often involves multiple objectives, with decision-maker's preferences dynamically changing in real production. Additionally, specifying a numerical reward function for different objectives typically requires meticulous manual tuning by experts. To overcome these challenges, we present a novel hybrid intelligent scheduling method that integrates human feedback into RL (HIS-HFRL) for adaptive preference objectives. We focus on three objectives: total tardiness, maximum tardiness, and total inventory and delay costs. In HIS-HFRL, the reward model is developed by incorporating human feedback. Composite rules are simulated to generate trajectories and obtain objective values, which are then scored by human experts based on current preferences. States in different trajectories are labelled with rewards according to these scores. In this way, the samples with state and reward label are collected to construct the reward model. Finally, a double deep Q-network-based training algorithm is developed to train agents using this reward model, enabling effective scheduling decisions for machine assignment and operation sequencing. Extensive experiments in an ACM workshop demonstrate the superiority of HIS-HFRL over existing methods across various scenarios.