The Fragility of Optimized Bandit Algorithms
研究了多臂赌博机算法中遗憾的尾部分布,发现优化期望遗憾会导致遗憾尾部极重(如柯西分布),且模型误设会大幅增加期望遗憾,揭示了期望遗憾率与尾部重度的权衡,并提出通过增加探索来增强鲁棒性。
What Does the Distribution of Regret Look Like for Bandit Algorithms? In multiarmed bandit theory, the expected regret has long been the predominant performance measure studied in the literature. For the first time, in “The fragility of optimized bandit algorithms,” Fan and Glynn study the tail of the regret distribution. For a given bandit model, it is natural to optimize to achieve the minimum possible expected regret growth rate. However, the authors show that optimizing for expected regret alone can result in severe fragility. Optimized expected regret rates come with the side effect of extremely heavy regret tails (like those of a Cauchy distribution) and thus, significant chance of catastrophically large regret. Moreover, optimized algorithms essentially operate on a knife’s edge such that the slightest degree of model misspecification in algorithm design can result in a much larger expected regret rate. Among other insights, the authors characterize sharp trade-offs between the expected regret rate and the heaviness of the regret tail. They also show how to robustify an algorithm and lighten the regret tail through increased exploration.