🌙

带强盗反馈的分布式在线随机约束凸优化

Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback

IEEE Transactions on Cybernetics · 2022
被引 16
ABS 3

中文导读

研究了多智能体网络中带时变约束的分布式在线随机凸优化问题,提出自适应分布式强盗原始对偶算法,在子线性漂移下实现子线性动态遗憾和约束违反。

Abstract

This article studies the distributed online stochastic convex optimization problem with the time-varying constraint over a multiagent system constructed by various agents. The sequences of cost functions and constraint functions, both of which have dynamic parameters following time-varying distributions, are unacquainted to the agent ahead of time. Agents in the network are able to interact with their neighbors through a sequence of strongly connected and time-varying graphs. We develop the adaptive distributed bandit primal-dual algorithm whose step size and regularization sequences are adaptive and have no prior knowledge about the total iteration span T . The adaptive distributed bandit primal-dual algorithm applies bandit feedback with a one-point or two-point gradient estimator to evaluate gradient values. It is illustrated in this article that if the drift of the benchmark sequence is sublinear, then the adaptive distributed bandit primal-dual algorithm exhibits sublinear expected dynamic regret and constraint violation using both two kinds of gradient estimator to compute gradient information. We present a numerical experiment to show the performance of the proposed method.

分布式优化在线学习凸优化多智能体系统