一种新型混合智能调度方法：将人类反馈融入强化学习以适应偏好目标

A novel hybrid intelligent scheduling: integrating human feedback into reinforcement learning for adaptive preference objectives

International Journal of Production Research · 2025

被引 2

ABS 3

Chen Ding
Fei Qiao
Dongyuan Wang
Juan Liu 通讯

中文导读

提出一种将人类反馈融入强化学习的混合智能调度方法，用于航空航天部件制造中动态变化的偏好目标，实验证明优于现有方法。

Abstract

Reinforcement learning (RL) is an efficient method for addressing scheduling problems with good real-time performance. However, scheduling in aerospace component manufacturing (ACM) often involves multiple objectives, with decision-maker's preferences dynamically changing in real production. Additionally, specifying a numerical reward function for different objectives typically requires meticulous manual tuning by experts. To overcome these challenges, we present a novel hybrid intelligent scheduling method that integrates human feedback into RL (HIS-HFRL) for adaptive preference objectives. We focus on three objectives: total tardiness, maximum tardiness, and total inventory and delay costs. In HIS-HFRL, the reward model is developed by incorporating human feedback. Composite rules are simulated to generate trajectories and obtain objective values, which are then scored by human experts based on current preferences. States in different trajectories are labelled with rewards according to these scores. In this way, the samples with state and reward label are collected to construct the reward model. Finally, a double deep Q-network-based training algorithm is developed to train agents using this reward model, enabling effective scheduling decisions for machine assignment and operation sequencing. Extensive experiments in an ACM workshop demonstrate the superiority of HIS-HFRL over existing methods across various scenarios.

强化学习生产调度航空航天制造多目标优化

阅读原文 ↗