在线品种定制的深度强化学习：一种数据驱动方法

Deep Reinforcement Learning for Online Assortment Customization: A Data-Driven Approach

Production and Operations Management · 2025

被引 1

人大 AFT50UTD24ABS 4

Tao Li · 香港科技大学
Chenhao Wang · 同济大学
Yao Wang · 西安交通大学
Shaojie Tang · 布法罗大学
Ningyuan Chen · 多伦多大学通讯

中文导读

针对库存有限平台，提出一种数据驱动的深度强化学习方法，利用历史交易数据构建模拟器，通过深度神经网络和优势演员-评论家算法动态定制品种组合，以最大化长期收益。

Abstract

When a platform has limited inventory, it is important to have a variety of products available for each customer while managing the remaining stock. To maximize revenue over the long term, the assortment policy needs to take into account the complex purchasing behavior of customers whose arrival orders and preferences may be unknown. We propose a data-driven approach for dynamic assortment planning that utilizes historical customer arrivals and transaction data. To address the challenge of online assortment customization, we use a Markov decision process framework and employ a model-free deep reinforcement learning (DRL) approach to solve the online assortment policy because of the computational challenge. Our method uses a specially designed deep neural network (DNN) model to create assortments while observing the inventory constraints, and an advantage actor-critic algorithm to update the parameters of the DNN model, with the help of a simulator built from the historical transaction data. To evaluate the effectiveness of our approach, we conduct simulations using both a synthetic data set generated with a pre-determined customer type distribution and ground-truth choice model, as well as a real-world data set. Our extensive experiments demonstrate that our approach produces significantly higher long-term revenue compared to some existing methods and remains robust under various practical conditions. We also demonstrate that our approach can be easily adapted to a more general problem that includes reusable products, where customers might return purchased items. In this setting, we find that our approach performs well under various usage time distributions.

运营管理动态定价与品种规划强化学习数据驱动决策

阅读原文 ↗