Combining deep reinforcement learning and multi-stage stochastic programming to address the supply chain inventory management problem
提出一种结合深度强化学习与多阶段随机规划的启发式方法,用于解决两阶段发散型供应链的库存管理问题,在总成本上显著优于纯深度强化学习算法。
We introduce a novel heuristic for solving the supply chain inventory management problem in the case of two-echelon divergent supply chains. The proposed heuristic enhances the current state-of-the-art by combining deep reinforcement learning with multi-stage stochastic programming. In particular, we employ deep reinforcement learning to determine the number of production batches, while multi-stage stochastic programming is used for shipping decisions. To support further research, we make publicly available a software environment that simulates a wide range of two-echelon divergent supply chain settings, including different types of seasonal demands. We present a rich set of numerical experiments considering constraints on production and warehouse capacities under fixed and variable logistic costs. The results demonstrate that the proposed heuristic significantly and consistently outperforms pure deep reinforcement learning algorithms in terms of total costs. Moreover, it overcomes several inherent limitations of multi-stage stochastic programming models, thus further highlighting its potential advantages in solving the supply chain inventory management problem.