通过高分辨率微分方程理解加速现象

Understanding the acceleration phenomenon via high-resolution differential equations

Mathematical Programming · 2021
被引 126 · 同刊同年前 1%
ABS 4

中文导读

研究了基于梯度的优化算法的高分辨率微分方程,区分了Nesterov加速法和Polyak重球法,并发现Nesterov法中的梯度校正项导致收敛差异,还发现Nesterov法以立方速率最小化梯度范数平方。

Abstract

Abstract Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms—Nesterov’s accelerated gradient method for strongly convex functions (NAG-) and Polyak’s heavy-ball method—we study an alternative limiting process that yields high-resolution ODEs . We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG- and Polyak’s heavy-ball method, but they allow the identification of a term that we refer to as “gradient correction” that is present in NAG- but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov’s accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result—that NAG- minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG- for smooth convex functions.

优化算法微分方程凸优化机器学习理论