考虑顾客选择的多产品库存系统的高效学习框架

An efficient learning framework for multiproduct inventory systems with customer choices

Production and Operations Management · 2022

被引 4

人大 AFT50UTD24ABS 4

Xiangyu Gao · 香港中文大学通讯
Huanan Zhang · 科罗拉多大学博尔德分校

中文导读

针对顾客购买受产品可得性影响的多产品库存系统，提出一种基于UCB的学习框架，利用两种改进思路加速学习，在超过1000个候选策略时50期内平均遗憾约15%。

Abstract

We consider a periodic‐review multiproduct inventory system where customers' purchasing decisions are affected by the product availabilities. Demands need to be learned on the fly, through the partial and censored feedback of customers. For this learning problem, if one ignores the inventory dynamic and treats it as a multiarmed bandit problem and directly applies some existing algorithms, for example, the upper confidence bound (UCB) algorithm, the convergence can be extremely slow due to the high‐dimensionality of the policy space. We propose a UCB‐based learning framework that utilizes the sales information based on two improvement ideas. We illustrate how these two ideas can be incorporated by considering two specific systems: (1) multiproduct inventory system with stock‐out substitutions, (2) multiproduct inventory assortment problem for urban warehouses. We develop improved UCB algorithms for both systems, using the two improvements. For both systems, the algorithm can achieve a tight worst‐case convergence rate (up to a logarithmic term) on the planning horizon <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:semantics definitionURL="" encoding=""> <mml:mi>T</mml:mi> <mml:annotation encoding="">$T$</mml:annotation> </mml:semantics> </mml:math> . Extensive numerical experiments are conducted to demonstrate the efficiency of the improved UCB algorithms for the two systems. In the experiments, when there are more than 1000 candidate policies to choose from, the algorithms can achieve around <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:semantics definitionURL="" encoding=""> <mml:mrow> <mml:mn>15</mml:mn> <mml:mo>%</mml:mo> </mml:mrow> <mml:annotation encoding="">$15\%$</mml:annotation> </mml:semantics> </mml:math> average expected regret within 50 periods and continue to steadily improve as time increases.

库存管理运营管理机器学习多臂老虎机

阅读原文 ↗