技术说明：一般分布的椭圆势引理及其在线性汤普森采样中的应用

Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Operations Research · 2022

被引 2

人大 AFT50UTD24ABS 4*

Nima Hamidi · 斯坦福大学
Mohsen Bayati · 斯坦福大学

中文导读

提出了一个不要求高斯假设的一般椭圆势引理，并将其用于证明随机线性赌博机中汤普森采样算法的贝叶斯遗憾界达到极小化最优。

Abstract

A General Elliptical Potential Lemma In sequential learning and decision-making problems, the elliptical potential lemma is a key technique to quantify the decrease in the uncertainty of the model as more observations are obtained. However, it requires the observation noise and prior distribution of the unknown parameters to be Gaussian. In “The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling,” N. Hamidi and M. Bayati introduce a general version of the elliptical potential lemma that relaxes the Gaussian assumption. They also apply their general lemma to prove a minimax optimal Bayesian regret bound for the well-known Thompson sampling algorithm in stochastic linear bandits with changing action sets where prior and noise distributions are general.

机器学习统计学习序贯决策线性赌博机

阅读原文 ↗