乐观Gittins指数

Optimistic Gittins Indices

Operations Research · 2022

被引 3

人大 AFT50UTD24ABS 4*

Vivek F. Farias · 麻省理工学院
Eli Gutin · 优步人工智能（美国）

中文导读

提出一系列乐观近似Gittins指数，结合递增折扣因子，在贝叶斯多臂老虎机问题中实现遗憾最优，且计算开销低、性能优于现有方法。

Abstract

We propose a tightening sequence of optimistic approximations to the Gittins index in “Optimistic Gittins Indices.” We show that the use of these approximations in concert with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art index schemes proposed for the Bayesian multiarmed bandit problem. We prove that the use of these optimistic indices constitutes a regret optimal algorithm. Perhaps more interestingly, the use of even the loosest of these approximations appears to offer substantial performance improvements over state-of-the-art alternatives while incurring little to no additional computational overhead relative to the simplest of these alternatives.

贝叶斯多臂老虎机强化学习在线学习算法遗憾最优

阅读原文 ↗