Fast finite-sum optimization via cyclically-sampled Hessian averaging methods
研究了基于海森平均的子采样牛顿方法,允许梯度不精确,通过确定性循环采样实现快速局部超线性收敛,适用于强凸和非凸函数,在逻辑回归中表现优于随机采样方法。
Abstract We consider minimizing finite-sum objective functions via Hessian-averaging based subsampled Newton methods. These methods allow for gradient inexactness and have fixed per-iteration Hessian approximation costs. The recent work (Na et al. 2023) demonstrated that Hessian averaging can be utilized to achieve fast $$\mathcal {O}\left( \sqrt{\tfrac{\log k}{k}}\right) $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mfenced> <mml:msqrt> <mml:mstyle> <mml:mfrac> <mml:mrow> <mml:mo>log</mml:mo> <mml:mi>k</mml:mi> </mml:mrow> <mml:mi>k</mml:mi> </mml:mfrac> </mml:mstyle> </mml:msqrt> </mml:mfenced> </mml:mrow> </mml:math> local superlinear convergence for strongly convex functions in high probability, while maintaining fixed per-iteration Hessian costs. These methods, however, require gradient exactness and strong convexity, which poses challenges for their practical implementation. To address this concern we consider Hessian-averaged methods that allow gradient inexactness via norm condition based adaptive-sampling strategies. Furthermore, to better control the error in the subsampled Hessian approximations, we utilize Hessian averaging with deterministic cyclic sampling techniques instead of random sampling, which leads to fast local superlinear convergence. We develop a comprehensive convergence theory, including global linear and sublinear convergence rates for strongly convex and nonconvex functions, respectively. Additionally, we establish an improved local superlinear convergence rate of $$\mathcal {O}\left( \tfrac{1}{k}\right) $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mfenced> <mml:mstyle> <mml:mfrac> <mml:mn>1</mml:mn> <mml:mi>k</mml:mi> </mml:mfrac> </mml:mstyle> </mml:mfenced> </mml:mrow> </mml:math> . Our analysis introduces novel techniques that differ from previous probabilistic approaches. We investigate the performance of these methods on logistic regression problems, demonstrating significant improvements in convergence over similar Hessian-averaging methods that utilize stochastic sampling.