结合整数规划的深度策略迭代用于库存管理

Deep Policy Iteration with Integer Programming for Inventory Management

Manufacturing & Service Operations Management · 2025

被引 10 · 同刊同年前 4%

人大 AFT50UTD24ABS 3

Pavithra Harsha · IBM研究-托马斯·J·沃森研究中心
Ashish Jagmohan
Jayant Kalagnanam · IBM研究-托马斯·J·沃森研究中心
Brian Quanz · IBM研究-托马斯·J·沃森研究中心
Divya Singhvi · 纽约大学

中文导读

提出一种强化学习框架PARL，结合神经网络与数学规划，解决具有组合动作空间和状态依赖约束的库存补货问题，在复杂供应链中平均性能提升14.7%。

Abstract

Problem definition: In this paper, we present a reinforcement learning (RL)-based framework for optimizing long-term discounted reward problems with large combinatorial action space and state dependent constraints. These characteristics are common to many operations management problems, for example, network inventory replenishment, where managers have to deal with uncertain demand, lost sales, and capacity constraints that results in more complex feasible action spaces. Our proposed programmable actor RL (PARL) uses a deep-policy iteration method that leverages neural networks to approximate the value function and combines it with mathematical programming and sample average approximation to solve the per-step-action optimally while accounting for combinatorial action spaces and state-dependent constraint sets. Methodology/results: We then show how the proposed methodology can be applied to complex inventory replenishment problems where analytical solutions are intractable. We also benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishment heuristics and find that the proposed algorithm considerably outperforms existing methods by as much as 14.7% on average in various complex supply chain settings. Managerial implications: We find that this improvement in performance of PARL over benchmark algorithms can be directly attributed to better inventory cost management, especially in inventory constrained settings. Furthermore, in the simpler setting where optimal replenishment policy is tractable or known near optimal heuristics exist, we find that the RL-based policies can learn near optimal policies. Finally, to make RL algorithms more accessible for inventory management researchers, we also discuss the development of a modular Python library that can be used to test the performance of RL algorithms with various supply chain structures. This library can spur future research in developing practical and near-optimal algorithms for inventory management problems. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0617 .

库存管理强化学习运筹学数学优化供应链管理

阅读原文 ↗