应对异步随机梯度下降中任意异构数据而不依赖工作节点调度

Tackling Arbitrarily Heterogeneous Data in Asynchronous Stochastic Gradient Descent Without Worker Scheduling

INFORMS journal on computing · 2026
被引 0
人大 BUTD24ABS 3

中文导读

提出双延迟随机梯度下降算法,通过服务器端缓冲区利用所有工作节点的过时梯度来抵消数据异构影响,无需调度策略即可实现全异步训练,在高度异构数据下性能优于现有方法。

Abstract

We consider the distributed optimization problem with data dispersed across multiple workers under the orchestration of a parameter server. In distributed environments, variations in computation speeds and network conditions across workers often lead to significant idle times in synchronous training. Although asynchronous training has been widely explored to reduce the synchronization overhead, existing methods either assume bounded dissimilarity among workers’ local data, which hampers performance under high data heterogeneity, or rely on worker scheduling strategies that limit system asynchrony. This work proposes the dual-delayed stochastic gradient descent (DuDe-SGD) algorithm to overcome the above limitations. Through a server-side buffer architecture, DuDe-SGD makes use of stale stochastic gradients from all workers to neutralize the effects of data heterogeneity while maintaining full asynchrony and per-iteration computation cost on par with traditional asynchronous stochastic gradient descent (SGD) algorithms. Our analysis demonstrates that DuDe-SGD achieves a comparable convergence rate for smooth nonconvex problems as state-of-the-art asynchronous SGD algorithms, even with arbitrarily heterogeneous data without adopting any worker scheduling schemes. Numerical experiments demonstrate the favorable performance of DuDe-SGD compared with existing synchronous and asynchronous SGD-based algorithms, especially in scenarios with highly heterogeneous data. History: Accepted by Antonio Frangioni, Area Editor for Design & Analysis of Algorithms–Continuous. Funding: The research of X. Wang was supported in part by the Young Scientists Fund of the National Natural Science Foundation of China [Grant 12501426] and the Key Program of the National Natural Science Foundation of China [Grant 62432007]. The research of J. Zhang was supported in part by the Hong Kong Research Grants Council under the Areas of Excellence Scheme [Grant AoE/E-601/22-R] and in part by the National Natural Science Foundation of China/Hong Kong Research Grants Council Collaborative Research Scheme [Grant CRS_HKUST603/22]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2025.1443 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2025.1443 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .

分布式优化异步随机梯度下降数据异构机器学习算法