宽度对神经网络的好处:盆地的消失

On the Benefit of Width for Neural Networks: Disappearance of Basins

SIAM Journal on Optimization · 2022
被引 9
ABS 3

中文导读

本文证明,随着网络宽度增加,损失函数表面从存在次优盆地(严格局部最小值)转变为无次优盆地,揭示了宽度带来的优化优势。

Abstract

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove? To understand the benefit of width, it is important to identify the difference between wide and narrow networks. In this work, we prove that from narrow to wide networks, there is a phase transition from having suboptimal basins to no suboptimal basins. Specifically, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no suboptimal basin, where “basin” is defined as the setwise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not global. These two results together show the phase transition from narrow to wide networks.

神经网络优化景观相变数学优化深度学习理论