Sequential Sponsored-Products and Off-Amazon Advertising Optimization for Etailers
研究了电商卖家在亚马逊上同时优化站内赞助产品广告和站外广告的序贯决策问题,提出了基于汤普森采样的算法并证明了其遗憾界,对运营管理学者和电商从业者有参考价值。
Sponsored-products (SP) advertising is a popular way to promote products on Amazon. Etailers who have a large catalog of products often create SP ad groups for products with similar attributes. An SP ad group consists of a set of products that share a same keyword set used for product search. In addition to SP ads, etailers may link to external websites for advertising their products, which is called off-Amazon (OA) ads. This study focuses on the optimization of sequential SP and OA (abbreviated as SSPOA) ads decisions for etailers. We model the SSPOA optimization as a controlled Markovian multi-armed bandit (MAB) process. When the mean sales volume per unit time (i.e., sales rate) for each product is known, we characterize the etailer’s optimal SSPOA policy for products in an ad group. When the parameters of the sales rates are unknown, we develop a Thompson-sampling-based algorithm that couples the SP and OA ads decisions. We prove that the regret bound of the proposed algorithm is <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:mrow> <mml:mover> <mml:mi>O</mml:mi> <mml:mo stretchy="false">~</mml:mo> </mml:mover> </mml:mrow> <mml:mo stretchy="false">(</mml:mo> <mml:msqrt> <mml:mi>T</mml:mi> </mml:msqrt> <mml:mo stretchy="false">)</mml:mo> </mml:math> , where <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:mi>T</mml:mi> </mml:math> is the total horizon length. Compared with existing literature, our problem additionally considers the regret from applying the estimated control policy and the impacts of choosing non-optimal keyword sets on subsequent states. We also conduct numerical experiments that validate our theoretical results. Moreover, we extend the base model in several directions, that is, considering unknown transition rates between different sales rate levels, incorporating correlated keyword sets, and learning the optimal policy using Posterior Sampling for reinforcement learning under a discretized setting.