全渠道零售商的库存补货与履行决策：一种基于强化学习的方法

Inventory replenishment and fulfilment decisions for an omnichannel retailer: a reinforcement learning-based method

International Journal of Production Research · 2025

被引 2

ABS 3

Maryam Kolyaei 通讯
Lele Zhang
Michelle Blom

中文导读

研究全渠道零售商在容量受限网络中的补货与履行决策，采用深度强化学习算法最大化预期总利润，实验表明该方法在高维决策和不确定环境下优于基线方法。

Abstract

We address the replenishment and fulfilment challenges faced by an omnichannel retailer within a capacitated retail network, selling products to a large region across a multi-period horizon. This horizon is partitioned into cycles, where replenishment occurs at the start of each cycle and fulfilment decisions regarding how much to replenish and allocate across sales channels take place in each time period. Our model considers Click and Collect (C&C) – also known as Buy Online and Pick-up in Store (BOPS) – as well as ship-from-store strategies with the aim of maximising the retailer's expected total profit. The problem is formulated as a Markov Decision Process (MDP). To solve the MDP, a tailored Proximal Policy Optimisation (PPO) algorithm, a form of Deep Reinforcement Learning (DRL), is adopted. We conduct experiments across varying product and store numbers, store capacities, and demand variability to evaluate the performance and robustness of our approach. Furthermore, we evaluate the impact of different demand patterns by first training decision-making policies on specific patterns and then testing them on alternative patterns. Our results reveal that the tailored approach effectively handles high-dimensional decision-making, different demand patterns, uncertainty, and constrained capacity environments while improving profitability compared to baseline methods.

全渠道零售库存管理强化学习运营管理

阅读原文 ↗