固定小批量梯度下降估计量的统计分析

Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator

Journal of Computational and Graphical Statistics · 2023

被引 16 · 同刊同年前 6%

ABS 3

Haobo Qi
Feifei Wang 通讯
Hansheng Wang

中文导读

研究了固定小批量梯度下降算法，该算法将大数据集分成固定子集并依次计算梯度，降低了计算成本。通过线性回归模型分析了其数值收敛和统计效率，发现学习率需足够小但不宜过小，并提出了递减学习率策略。

Abstract

We study here a fixed mini-batch gradient descent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple nonoverlapping partitions. Once the partitions are formed, they are then fixed throughout the rest of the algorithm. For convenience, we refer to the fixed partitions as fixed mini-batches. Then for each computation iteration, the gradients are sequentially calculated on each fixed mini-batch. Because the size of fixed mini-batches is typically much smaller than the whole sample size, it can be easily computed. This leads to much reduced computation cost for each computational iteration. It makes FMGD computationally efficient and practically more feasible. To demonstrate the theoretical properties of FMGD, we start with a linear regression model with a constant learning rate. We study its numerical convergence and statistical efficiency properties. We find that sufficiently small learning rates are necessarily required for both numerical convergence and statistical efficiency. Nevertheless, an extremely small learning rate might lead to painfully slow numerical convergence. To solve the problem, a diminishing learning rate scheduling strategy can be used. This leads to the FMGD estimator with faster numerical convergence and better statistical efficiency. Finally, the FMGD algorithms with random shuffling and a general loss function are also studied. Supplementary materials for this article are available online.

机器学习优化算法统计学计量经济学

阅读原文 ↗