混合嵌入SAM引导的反馈网络用于RGB-热红外城市街景解析

Hybrid Embedding SAM-Guided Feedback Network for RGB–Thermal Urban Scene Parsing

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2026

被引 0

ABS 3

Xun Yang
Yaoru Sun
Chenglong Xu
Bo Yuan
Xuejie Yang
Qunhui Yang

中文导读

提出一种基于SAM框架的混合嵌入反馈网络，通过模态结构对齐和跨架构知识迁移，提升RGB-热红外城市街景分割的精度和鲁棒性，在多个数据集上平均准确率提升约5%。

Abstract

In multimodal semantic segmentation tasks of urban street scenes, existing methods lack modeling of intermodal structural alignment and semantic cooperation between architectures, leading to insufficient fusion feature representations. To address this issue, this article proposes a novel structural optimization network: a hybrid embedding segment anything model (SAM) guided feedback network (GFNet). This network is based on the SAM framework and achieves multimodal structural alignment by transforming the semantic prior (SP) extractor through module-level fine-tuning of the image encoder. Furthermore, this article proposes a cross-architecture knowledge transfer (CAKT) mechanism, injecting the structural awareness capability of SAM into the backbone features of each layer, achieving dual optimization of alignment and enhancement. To address the issues of intermodal heterogeneity and semantic conflict, this article combines complementary fusion at different frequencies and cross-modal similarity enhancement strategies to achieve fine-grained semantic fusion and consistency modeling, supplemented by a dual-supervised constraint mechanism to improve modal independence and robustness. On several challenging datasets, mAcc is improved by about 5%, and GFNet demonstrates the superior segmentation performance and robustness compared to existing methods. Our code will be released to the public at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/WBangG/GFNet</uri>

多模态语义分割城市街景解析深度学习图像分割

阅读原文 ↗