Fast and Effective: A Novel Sequential Single-Path Search for Mixed-Precision-Quantized Networks
提出一种顺序单路径搜索方法,在硬件资源等约束下快速确定深度神经网络各层的量化位宽,实验表明该方法在多种架构和数据集上优于均匀精度模型。
Model quantization can reduce the model size and computational latency, it has been successfully applied for many applications of mobile phones, embedded devices, and smart chips. Mixed-precision quantization models can match different bit precision according to the sensitivity of different layers to achieve great performance. However, it is difficult to quickly determine the quantization bit precision of each layer in deep neural networks under some constraints (for example, hardware resources, energy consumption, model size, and computational latency). In this article, a novel sequential single-path search (SSPS) method for mixed-precision model quantization is proposed, in which some given constraints are introduced to guide the searching process. A single-path search cell is proposed to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of the searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (for example, ResNet-20, 18, 34, 50, and MobileNet-V2) and datasets (for example, CIFAR-10, ImageNet, and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform-precision counterparts.