面向少样本人-物交互识别的任务导向高阶上下文图网络

Task-Oriented High-Order Context Graph Networks for Few-Shot Human-Object Interaction Recognition

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2021

被引 9

ABS 3

Zhong Ji
Ping An
Xiyao Liu
Yanwei Pang
Ling Shao
Zhongfei Zhang

中文导读

提出一种任务导向高阶上下文图网络，通过构建任务级图并利用高阶上下文信息，在少样本场景下有效识别人类动作与周围物体的交互，在HICO-FS和TUHOI-FS数据集上超越现有方法。

Abstract

Few-shot human-object interaction (FS-HOI) recognition aims at inferring new interactions between human actions and surrounding objects merely with a few available instances. It is beneficial to alleviate the long-tail and combinatorial explosion problems in human-object interaction (HOI). Nevertheless, the existing FS-HOI methods only focus on modeling the relationships between labeled samples and unlabeled samples in the Euclidean domain, which neglects the rich relational structures of the visual information among labeled samples and between human actions and objects. Accordingly, we tackle the few-shot HOI task in the non-Euclidean domain and present a graph-based model, namely, task-oriented high-order context graph network (THCG-Net). It contains a task attention module (TA-Module) and a high-order context graph module (HG-Module). In TA-Module, an attention mechanism is designed by utilizing task information to build a task-oriented space, in which the discriminative information for the current task (episode) is captured by embedding the visual features into the task-oriented space. The HG-Module is proposed to construct a task-level graph and takes the context information as high-order knowledge, which provides discriminative guidance for propagating visual information. It captures the discriminability among different categories while highlights the commonality of related categories adaptively, which effectively transfers knowledge to related categories. Extensive experimental results on two benchmark datasets, HICO-FS and TUHOI-FS, are provided. It demonstrates that our THCG-Net significantly outperforms the state-of-the-art approaches, which proves its impressive effectiveness in recognizing various human actions and surrounding objects in few-shot scenarios.

少样本学习人-物交互识别图神经网络计算机视觉

阅读原文 ↗