加性模型下的随机连续臂老虎机:极小化极大遗憾与自适应算法

Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm

Annals of Statistics · 2022
被引 3
ABS 4★

中文导读

研究了d维随机连续臂老虎机,假设期望奖励函数为稀疏加性β-赫尔德函数,建立了极小化极大遗憾的收敛速率,并提出了一个达到最优速率的自适应算法。

Abstract

We consider d-dimensional stochastic continuum-armed bandits with the expected reward function being additive β-Hölder with sparsity s for 0<β<∞ and 1≤s≤d. The rate of convergence O˜(s·T β+12β+1) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s. A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T. The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.

统计学习在线学习非参数回归高维数据