Scalable policies for the dynamic traveling multi-maintainer problem with alerts
针对带警报的动态多维护人员旅行问题,提出一种迭代深度强化学习算法,通过重构动作空间和智能初始解,实现大规模资产网络的高效维护,实验表明算法可扩展且鲁棒。
Downtime of industrial assets such as wind turbines and medical imaging devices is costly. To avoid such downtime costs, companies seek to initiate maintenance just before failure, which is challenging because: (i) Asset failures are notoriously difficult to predict, even in the presence of real-time monitoring devices which signal degradation; and (ii) Limited resources are available to serve a network of geographically dispersed assets. In this work, we study the dynamic traveling multi-maintainer problem with alerts ( K -DTMPA) under perfect condition information with the objective to devise scalable solution approaches to maintain large networks with K maintenance engineers. Since such large-scale K -DTMPA instances are computationally intractable, we propose an iterative deep reinforcement learning (DRL) algorithm optimizing long-term discounted maintenance costs. The efficiency of the DRL approach is vastly improved by a reformulation of the action space (which relies on the Markov structure of the underlying problem) and by choosing a smart, suitable initial solution. The initial solution is created by extending existing heuristics with a dispatching mechanism. These extensions further serve as compelling benchmarks for tailored instances. We demonstrate through extensive numerical experiments that DRL can solve single maintainer instances up to optimality, regardless of the chosen initial solution. Experiments with hospital networks containing up to 35 assets show that the proposed DRL algorithm is scalable. Lastly, the trained policies are shown to be robust against network modifications such as removing an asset or an engineer or yield a suitable initial solution for the DRL approach.