Optimistic Gittins Indices
提出一系列乐观近似Gittins指数,结合递增折扣因子,在贝叶斯多臂老虎机问题中实现遗憾最优,且计算开销低、性能优于现有方法。
We propose a tightening sequence of optimistic approximations to the Gittins index in “Optimistic Gittins Indices.” We show that the use of these approximations in concert with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art index schemes proposed for the Bayesian multiarmed bandit problem. We prove that the use of these optimistic indices constitutes a regret optimal algorithm. Perhaps more interestingly, the use of even the loosest of these approximations appears to offer substantial performance improvements over state-of-the-art alternatives while incurring little to no additional computational overhead relative to the simplest of these alternatives.