优化赌博机算法的脆弱性

The Fragility of Optimized Bandit Algorithms

Operations Research · 2024

被引 4

人大 AFT50UTD24ABS 4*

Lin Tao Fan · 西北大学（美国）
Peter W. Glynn · 斯坦福大学

中文导读

研究了多臂赌博机算法中遗憾的尾部分布，发现优化期望遗憾会导致遗憾尾部极重（如柯西分布），且模型误设会大幅增加期望遗憾，揭示了期望遗憾率与尾部重度的权衡，并提出通过增加探索来增强鲁棒性。

Abstract

What Does the Distribution of Regret Look Like for Bandit Algorithms? In multiarmed bandit theory, the expected regret has long been the predominant performance measure studied in the literature. For the first time, in “The fragility of optimized bandit algorithms,” Fan and Glynn study the tail of the regret distribution. For a given bandit model, it is natural to optimize to achieve the minimum possible expected regret growth rate. However, the authors show that optimizing for expected regret alone can result in severe fragility. Optimized expected regret rates come with the side effect of extremely heavy regret tails (like those of a Cauchy distribution) and thus, significant chance of catastrophically large regret. Moreover, optimized algorithms essentially operate on a knife’s edge such that the slightest degree of model misspecification in algorithm design can result in a much larger expected regret rate. Among other insights, the authors characterize sharp trade-offs between the expected regret rate and the heaviness of the regret tail. They also show how to robustify an algorithm and lighten the regret tail through increased exploration.

多臂赌博机遗憾分析算法鲁棒性机器学习

阅读原文 ↗