缩小上置信界:城市仓库的动态产品选择问题

Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

Management Science · 2021
被引 6
人大 A+FT50UTD24ABS 4*

中文导读

研究了城市仓库中动态选择大量高需求产品的算法,提出一种缩小上置信界的新方法,在阿里巴巴工业数据集上比标准UCB算法降低至少10%的累积遗憾。

Abstract

The recent rising popularity of ultrafast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for such online retailers: the number of stock keeping units (SKUs) they carry is no longer “the more, the better,” yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultrafast delivery platforms. We distill the product selection problem into a semibandit model with linear generalization. There are in total N arms corresponding to N products, each with a feature vector of dimension d. The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d. We first analyze a standard Upper Confidence Bound (UCB) algorithm and show its regret bound can be expressed as the sum of a T-independent part and a T-dependent part, which we refer to as “fixed cost” and “variable cost,” respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d. Moreover, we test the algorithms on an industrial data set from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%. This paper was accepted by J. George Shanthikumar, big data analytics.

动态产品选择半强盗模型上置信界算法城市仓库