贝叶斯抖动学习法：动态定价中的渐近最优策略

Bayesian dithering for learning: Asymptotically optimal policies in dynamic pricing

Production and Operations Management · 2022

被引 2

人大 AFT50UTD24ABS 4

Woonghee Tim Huh · 不列颠哥伦比亚大学
Michael Jong Kim · 不列颠哥伦比亚大学
Mei-Chun Lin · 不列颠哥伦比亚大学通讯

中文导读

研究卖家在未知需求下对多个产品动态定价与学习的问题，提出抖动策略，在短视最优价格附近随机选择价格，实现渐近最优的遗憾上界。

Abstract

We consider a dynamic pricing and learning problem where a seller prices multiple products and learns from sales data about unknown demand. We study the parametric demand model in a Bayesian setting. To avoid the classical problem of incomplete learning, we propose dithering policies under which prices are probabilistically selected in a neighborhood surrounding the myopic optimal price. By analyzing the effect of dithering in facilitating learning, we establish regret upper bounds for three typical settings of demand model. We show that the dithering policy achieves an upper bound of order <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:semantics definitionURL="" encoding=""> <mml:mrow> <mml:mi>log</mml:mi> <mml:mi>T</mml:mi> </mml:mrow> <mml:annotation encoding="">$\log T$</mml:annotation> </mml:semantics> </mml:math> when the parameter set is finite. It can be modified to achieve a constant regret bound under an additional assumption. We also prove an upper bound of order <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <mml:semantics definitionURL="" encoding=""> <mml:msqrt> <mml:mrow> <mml:mi>T</mml:mi> <mml:mi>log</mml:mi> <mml:mi>T</mml:mi> </mml:mrow> </mml:msqrt> <mml:annotation encoding="">$\sqrt {T\log T}$</mml:annotation> </mml:semantics> </mml:math> when the parameter set is compact and convex. Each bound matches (up to a logarithmic factor) the existing lower bound of any pricing policy. In this way, we show that dithering policies achieve asymptotically optimal performance in three different parameter settings, which demonstrates dithering as a unified approach to strike the balance between exploration and exploitation.

动态定价贝叶斯学习探索与利用渐近最优算法

作者公开的免费版 ↗阅读原文 ↗