🌙

面向视觉与语言导航的历史引导提示生成

History-Guided Prompt Generation for Vision-and-Language Navigation

IEEE Transactions on Cybernetics · 2025
被引 1
ABS 3

中文导读

提出历史引导提示生成框架,通过熵值判断何时利用历史信息并生成提示向量,增强智能体对当前环境的感知,在四个主流基准上验证了有效性。

Abstract

Vision-and-language navigation (VLN) has garnered extensive attention in the field of embodied artificial intelligence. VLN involves time series information, where historical observations contain rich contextual knowledge and play a crucial role in navigation. However, current methods do not explicitly excavate the connection between rich contextual information in history and the current environment, and ignore adaptive learning of clues related to the current environment. Therefore, we explore a Prompt Learning-based strategy which adaptively mines information in history that is highly relevant to the current environment to enhance the agent's perception of the current environment and propose a history-guided prompt generation (HGPG) framework. Specifically, HGPG includes two parts, one is an entropy-based history acquisition module that assesses the uncertainty of the action probability distribution from the preceding step to determine whether historical information should be used at the current time step. The other part is the prompt generation module that transforms historical context into prompt vectors by sampling from an end-to-end learned token library. These prompt tokens serve as discrete, knowledge-rich representations that encode semantic cues from historical observations in a compact form, making them easier for the decision network to understand and utilize. In addition, we share the token library across various navigation tasks, mining common features between different tasks to improve generalization to unknown environments. Extensive experimental results on four mainstream VLN benchmarks (R2R, REVERIE, SOON, R2R-CE) demonstrate the effectiveness of our proposed method. Code is available at https://github.com/Wzmshdong/HGPG.

视觉与语言导航具身人工智能提示学习时间序列信息