🌙

基于层次参数映射的高效进化神经架构搜索用于单目深度估计

Efficient Evolutionary Neural Architecture Search With Hierarchical Parameter Mapping for Monocular Depth Estimation

IEEE Transactions on Evolutionary Computation · 2025
被引 1
ABS 4

中文导读

提出PTF-EvoMDE框架,通过层次参数映射避免候选网络预训练,结合特征对齐条件随机场解码器,在KITTI上达到与DPT相当的精度但仅用5%参数,NAS计算成本降低75%以上。

Abstract

Monocular Depth Estimation (MDE) plays a crucial role in various real-world applications, including autonomous driving and augmented reality. However, automating the design of efficient MDE models via neural architecture search (NAS) remains challenging due to the high computational cost associated with (1) pretraining numerous candidate encoders on large-scale datasets such as ImageNet and (2) the substantial memory demands of high-resolution depth estimation models. To address these issues, this paper introduces PTF-EvoMDE, an evolutionary NAS framework that eliminates the need for per-candidate pretraining. PTF-EvoMDE incorporates a hierarchical parameter mapping (HPM) strategy that transfers weights from a single pretrained template network (MobileNetV2) to candidate architectures with varying depths, widths, and kernel sizes, significantly reducing computational overhead. Additionally, a feature-aligned conditional random fields (Fa-CRFs) decoder is proposed, leveraging deformable convolutions to dynamically align features and mitigate spatial misalignment, thereby enhancing depth prediction accuracy. Experiments on the KITTI benchmark demonstrate that PTF-EvoMDE achieves an absolute relative error (Abs Rel) of 0.061 and a root mean squared error (RMSE) of 2.470, comparable to Dense Prediction Transformer (DPT) model (Abs Rel: 0.062, RMSE: 2.573) while requiring only 5% of the parameters. Moreover, PTF-EvoMDE reduces the computational cost of NAS by more than 75% compared to conventional evolutionary approaches that rely on per-candidate pretraining. The resulting lightweight encoder exhibits strong transferability, achieving competitive performance on COCO object detection (42.1% mAP) and Cityscapes semantic segmentation (76.4% mIoU) with minimal fine-tuning.

单目深度估计神经架构搜索进化算法计算机视觉