多智能体深度强化学习在多级库存管理中的应用

Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management

Production and Operations Management · 2024

被引 26 · 同刊同年前 4%

人大 AFT50UTD24ABS 4

Xiaotian Liu · 北京大学
Yijie Peng · 北京大学
Yaodong Yang · 北京大学
Ming Hu · 多伦多大学通讯

中文导读

研究了多智能体深度强化学习算法HAPPO在分散式多级库存管理中的应用，发现其比单智能体方法成本更低且能缓解牛鞭效应，并揭示了信息共享与目标设定对系统性能的影响。

Abstract

We apply heterogeneous-agent proximal policy optimization (HAPPO), a multi-agent deep reinforcement learning (MADRL) algorithm, to the decentralized multi-echelon inventory management problems in both a serial supply chain and a supply chain network. We also examine whether the upfront-only information-sharing mechanism used in MADRL helps alleviate the bullwhip effect. Our results show that policies constructed by HAPPO achieve lower overall costs than policies constructed by single-agent deep reinforcement learning and other heuristic policies. Also, the application of HAPPO results in a less significant bullwhip effect than policies constructed by single-agent deep reinforcement learning where information is not shared among actors. Somewhat surprisingly, compared to using the overall costs of the system as a minimization target for each actor, HAPPO achieves lower overall costs when the minimization target for each actor is a combination of its own costs and the overall costs of the system. Our results provide a new perspective on the benefit of information sharing inside the supply chain that helps alleviate the bullwhip effect and improve the overall performance of the system. Upfront information sharing and action coordination in model training among actors is essential, with the former even more essential, for improving a supply chain's overall performance when applying MADRL. Neither actors being fully self-interested nor actors being fully system-focused leads to the best practical performance of policies learned and constructed by MADRL. Our results also verify MADRL's potential in solving various multi-echelon inventory management problems with complex supply chain structures and in non-stationary market environments.

供应链管理库存管理强化学习运营管理人工智能

阅读原文 ↗