🌙

强单调与指数凹博弈中自适应、双重最优的无遗憾学习:基于梯度反馈

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Operations Research · 2024
被引 5
人大 AFT50UTD24ABS 4*

中文导读

提出自适应OGD算法AdaOGD,无需先验参数知识,在单智能体强凸情形下达到最优遗憾(对数因子内),在多智能体强单调博弈中实现联合行动以最优速率收敛至唯一纳什均衡,并应用于报童问题。

Abstract

Feasible Online Learning with Gradient Feedback Online gradient descent (OGD) is well-known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of [Formula: see text] for strongly convex cost functions, and (2) in the multiagent setting of strongly monotone games with each agent employing OGD we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of [Formula: see text]. Whereas these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In “Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback,” M. Jordan, T. Lin, and Z. Zhou design a fully adaptive OGD algorithm, AdaOGD, that does not require a priori knowledge of these parameters. In the single-agent setting, the algorithm achieves [Formula: see text] regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs AdaOGD in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of [Formula: see text], again optimal up to log factors. The algorithms are illustrated in a learning version of the classic newsvendor problem, in which, because of lost sales, only (noisy) gradient feedback can be observed. The results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multiretailer settings.

博弈论在线学习优化算法运筹学