多尺度池化时空注意力网络：提升运动想象脑机接口中的跨会话与小样本解码

Multiscale Pooling Spatial–Temporal Attention Network: Elevating Cross Session and Small Sample Decoding in Motor Imagery Brain–Computer Interfaces

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2026

被引 0

ABS 3

Weijie Chen
Yixin Chen
Xiao Wu
Xinjie He
Xingyu Wang
Jing Jin

中文导读

提出多尺度池化时空注意力网络（MPSTANet），融合混合池化与时空注意力机制，在四个公开数据集上实现高精度跨会话运动想象解码，尤其在小样本场景下优于现有深度学习模型。

Abstract

Motor imagery (MI) is one of the most widely used paradigms in brain–computer interfaces (BCIs), known for its ability to trigger changes in brain activity without the need for an external “cue” stimulus. This unique characteristic has attracted significant attention from neuroscientists and researchers in fundamental science. However, compared to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">P</i>300 and steady-state visual evoked potential (SSVEP), neural activity related to MI tends to be less stable and exhibits substantial variability between individuals. Consequently, accurately decoding MI, using both traditional machine learning and deep learning, has proven to be a considerable challenge. Moreover, given the difficulty of acquiring electroencephalography (EEG) data and the high data demands of deep learning, enhancing the accuracy of MI decoding with limited sample sizes remains a pressing issue that urgently needs to be addressed. This article addresses the challenges mentioned above by introducing a novel deep neural network designed for accurate MI decoding, which is designed to be effective with both small-sample sizes and larger datasets. This network, named the multiscale pooling spatial–temporal attention network (MPSTANet), integrates mix pooling techniques with spatial–temporal attention mechanisms. MPSTANet first employs local and global spatial attention, along with multiscale temporal attention, to thoroughly extract spatial–temporal information from EEG signals. Next, MPSTANet utilizes feature fusion and the proposed mix pooling technique to preserve as much of the extracted spatial–temporal information as possible. Finally, channel interaction attention (CIA) and 3-D weight attention (3-DWA) are employed to recalibrate the weights of the fused channels and spatial–temporal features, respectively. To validate the performance of our proposed MPSTANet model, we conducted experiments on four public datasets, including both small-sample sizes and subject-independent scenarios. MPSTANet achieved cross-session decoding accuracies of 84.82%, 72.92%, 88.20%, and 46.54% on the BCI Competition IV 2a dataset, the Open BMI dataset, the BCI Competition IV 2b dataset, and the PhysioNet dataset, respectively. Furthermore, MPSTANet demonstrated a significant lead compared to other deep learning models in both small-sample and subject-independent experiments. These results demonstrate the robustness of MPSTANet in MI decoding and its promising potential for BCI applications.

脑机接口运动想象深度学习脑电图解码注意力机制

阅读原文 ↗