Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning
研究在线零售中产品定位问题,提出动态品类选择与定位框架,通过TLR-UCB和EI-TLR算法学习最优产品摆放位置以最大化收入,理论证明算法达到最优学习效率,模拟显示显著优于现有方法。
This study addresses a key challenge in online retail: product positioning. The authors propose a novel online learning framework called dynamic assortment selection with positioning (DAP). Unlike traditional models that focus solely on item selection, DAP also learns optimal product placement to maximize revenue. The researchers model customer choices using a multinomial logit framework, where item appeal depends on both intrinsic preference and display position. They demonstrate that ignoring position effects leads to suboptimal performance and introduce a new algorithm, TLR-UCB, which effectively incorporates adaptive position-dependent feedback through a geometric linear bandit structure and truncated linear regression techniques. Theoretical analysis confirms that TLR-UCB achieves optimal learning efficiency. To handle unknown position effects, they further develop EI-TLR, a two-stage policy that jointly estimates customer preferences and positioning impacts before applying a generalized TLR-UCB procedure. Extensive simulations show that both TLR-UCB and EI-TLR significantly outperform existing benchmarks, offering powerful tools for dynamic, data-driven assortment and layout optimization in online marketplaces.