Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling
提出了一个不要求高斯假设的一般椭圆势引理,并将其用于证明随机线性赌博机中汤普森采样算法的贝叶斯遗憾界达到极小化最优。
A General Elliptical Potential Lemma In sequential learning and decision-making problems, the elliptical potential lemma is a key technique to quantify the decrease in the uncertainty of the model as more observations are obtained. However, it requires the observation noise and prior distribution of the unknown parameters to be Gaussian. In “The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling,” N. Hamidi and M. Bayati introduce a general version of the elliptical potential lemma that relaxes the Gaussian assumption. They also apply their general lemma to prove a minimax optimal Bayesian regret bound for the well-known Thompson sampling algorithm in stochastic linear bandits with changing action sets where prior and noise distributions are general.