🌙

上下文多臂老虎机的迁移学习

Transfer learning for contextual multi-armed bandits

Annals of Statistics · 2024
被引 10 · 同刊同年前 10%
ABS 4*

中文导读

研究了在协变量偏移模型下,利用源域数据辅助目标域学习的非参数上下文多臂老虎机问题,提出了达到极小化最优遗憾的迁移学习算法,并开发了自适应未知光滑性的数据驱动算法。

Abstract

Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected from source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits. In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an additional self-similarity assumption. A simulation study is carried out to illustrate the benefits of utilizing the data from the source domains for learning in the target domain.

迁移学习上下文多臂老虎机非参数统计计量经济学人工智能