Optimal deep neural networks by maximization of the approximation power
提出一种针对给定规模的深度神经网络的最优架构,通过最大化ReLU激活函数网络逼近的线性区域数量下界来优化宽度和深度,蒙特卡洛模拟和波士顿房价数据集验证其优于交叉验证和网格搜索。
We propose an optimal architecture for deep neural networks of given size. The optimal architecture obtains from maximizing the lower bound of the maximum number of linear regions approximated by a deep neural network with a ReLu activation function. The accuracy of the approximation function relies on the neural network structure characterized by the number, dependence and hierarchy between the nodes within and across layers. We show how the accuracy of the approximation improves as we optimally choose the width and depth of the network. A Monte-Carlo simulation exercise illustrates the outperformance of the optimized architecture against cross-validation methods and gridsearch for linear and nonlinear prediction models. The application of this methodology to the Boston Housing dataset confirms empirically the outperformance of our method against state-of the-art machine learning models.