Efficient Evolutionary Neural Architecture Search With Hierarchical Parameter Mapping for Monocular Depth Estimation
提出PTF-EvoMDE框架,通过层次参数映射避免候选网络预训练,结合特征对齐条件随机场解码器,在KITTI上达到与DPT相当的精度但仅用5%参数,NAS计算成本降低75%以上。
Monocular Depth Estimation (MDE) plays a crucial role in various real-world applications, including autonomous driving and augmented reality. However, automating the design of efficient MDE models via neural architecture search (NAS) remains challenging due to the high computational cost associated with (1) pretraining numerous candidate encoders on large-scale datasets such as ImageNet and (2) the substantial memory demands of high-resolution depth estimation models. To address these issues, this paper introduces PTF-EvoMDE, an evolutionary NAS framework that eliminates the need for per-candidate pretraining. PTF-EvoMDE incorporates a hierarchical parameter mapping (HPM) strategy that transfers weights from a single pretrained template network (MobileNetV2) to candidate architectures with varying depths, widths, and kernel sizes, significantly reducing computational overhead. Additionally, a feature-aligned conditional random fields (Fa-CRFs) decoder is proposed, leveraging deformable convolutions to dynamically align features and mitigate spatial misalignment, thereby enhancing depth prediction accuracy. Experiments on the KITTI benchmark demonstrate that PTF-EvoMDE achieves an absolute relative error (Abs Rel) of 0.061 and a root mean squared error (RMSE) of 2.470, comparable to Dense Prediction Transformer (DPT) model (Abs Rel: 0.062, RMSE: 2.573) while requiring only 5% of the parameters. Moreover, PTF-EvoMDE reduces the computational cost of NAS by more than 75% compared to conventional evolutionary approaches that rely on per-candidate pretraining. The resulting lightweight encoder exhibits strong transferability, achieving competitive performance on COCO object detection (42.1% mAP) and Cityscapes semantic segmentation (76.4% mIoU) with minimal fine-tuning.