On the convergence of the gradient descent method with stochastic fixed-point rounding errors under the Polyak–Łojasiewicz inequality
研究了低精度定点运算下,随机舍入策略如何影响梯度下降法在满足Polyak–Łojasiewicz不等式问题中的收敛性,发现有偏随机舍入能消除梯度消失问题并收紧收敛率上界。
Abstract In the training of neural networks with low-precision computation and fixed-point arithmetic, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers. This study provides insights into the choice of appropriate stochastic rounding strategies to mitigate the adverse impact of roundoff errors on the convergence of the gradient descent method, for problems satisfying the Polyak–Łojasiewicz inequality. Within this context, we show that a biased stochastic rounding strategy may be even beneficial in so far as it eliminates the vanishing gradient problem and forces the expected roundoff error in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point arithmetic.