小动作空间上通过最佳子集选择实现的近乎维度无关的稀疏线性赌博机

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection

Journal of the American Statistical Association · 2022

被引 8

ABS 4

Yi Chen
Yining Wang
Ethan X. Fang
Zhaoran Wang
Runze Li 通讯

中文导读

研究高维线性模型下的随机上下文赌博机问题，针对有限随机动作空间，改进LinUCB算法并采用最佳子集选择方法，实现了与维度几乎无关的遗憾上界，并通过实验验证了算法的适用性和鲁棒性。

Abstract

We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendations, online advertisements, and personalized medicine. However, it is very challenging to balance the exploration and exploitation tradeoff. We modify the LinUCB algorithm in doubly growing epochs and estimate the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves O(sT) regret with high probability, which is nearly independent of the “ambient” regression model dimension d. We further attain a sharper O(sT) regret by using the SupLinUCB framework and match the minimax lower bound of the low-dimensional linear stochastic bandit problem. Finally, we conduct extensive numerical experiments to empirically demonstrate our algorithms’ applicability and robustness. Supplementary materials for this article are available online.

高维线性模型上下文赌博机最佳子集选择遗憾分析

阅读原文 ↗