非线性岭型赌博机的统计复杂性与最优算法

Statistical complexity and optimal algorithms for nonlinear ridge bandits

Annals of Statistics · 2024

被引 1

ABS 4*

Nived Rajaraman
Yanjun Han
Jiantao Jiao
Kannan Ramchandran

中文导读

研究了平均结果是非线性函数时的序贯决策问题，发现除了标准参数率的学习阶段外，还存在由非线性函数决定的固定成本“燃烧期”，并给出了最优燃烧期的上下界，证明了两阶段算法在统计上最优。

Abstract

We consider the sequential decision-making problem where the mean outcome is a nonlinear function of the chosen action. Compared with the linear model, two curious phenomena arise in nonlinear models: first, in addition to the “learning phase” with a standard parametric rate for estimation or regret, there is an “burn-in period” with a fixed cost determined by the nonlinear function; second, achieving the smallest burn-in cost requires new exploration algorithms. For a special family of nonlinear functions named ridge functions in the literature, we derive upper and lower bounds on the optimal burn-in cost, and in addition, on the entire learning trajectory during the burn-in period via differential equations. In particular, a two-stage algorithm that first finds a good initial action and then treats the problem as locally linear is statistically optimal. In contrast, several classical algorithms, such as UCB and algorithms relying on regression oracles, are provably suboptimal.

机器学习在线学习非线性系统统计决策

阅读原文 ↗