🌙

基于多尺度骨架的时序动作分割:使用层级时序建模与预测集成

Multiscale Skeleton-Based Temporal Action Segmentation Using Hierarchical Temporal Modeling and Prediction Ensemble

IEEE Transactions on Cybernetics · 2025
被引 1
ABS 3

中文导读

提出一种多尺度骨架时序动作分割方法,通过时序概率金字塔和标签平滑集成,在降低计算量的同时提升分割精度,尤其适用于复杂动作实例。

Abstract

Skeleton-based temporal action segmentation (TAS) decomposes untrimmed skeleton sequence into meaningful segments. The variance in temporal scale challenges the skeleton modeling network to seek a balance between over-segmentation and under-segmentation. Current methods often rely on parallel multiscale feature extractors and additional refinement modules to mitigate the multiscale issue, which brings significant computations and complexity. To address these issues, this article proposes multiscale skeleton-based TAS (MSTAS), consisting of temporal probability pyramid (TPP) and smoothed multiscale ensemble (SME). TPP represents each action as a collection of multiscale probability distributions using a U-shape hierarchical temporal pyramid. Subsequently, SME takes the average of distributions instead of deploying additional refinement stages to achieve action segmentation. Considering the over-confident issue that exists in each scale, SME incorporates a novel label smoothing phase to improve the probability distributions by dynamically calibrating the confidence of each scale. Experimental results on four public datasets show that the MSTAS achieves state-of-the-art performance with less computation overheads, such as +1.1% accuracy and +2.8% F1@0.5 on the challenging LARa dataset with 70% fewer parameters and 80% fewer GFLOPS. Benefiting from confidence calibration, the MSTAS efficiently utilizes more temporal scales while keeping better calibration for ambiguous action instances. Additionally, the U-shape pyramid demonstrates a strong compatibility with classical refinement module, enabling the efficient extraction of multiscale motion representations.

计算机视觉动作识别时序分割骨架分析