基于激光雷达的人类活动识别的多模态数据处理系统

A Multimodal Data Processing System for LiDAR-Based Human Activity Recognition

IEEE Transactions on Cybernetics · 2021

被引 63

ABS 3

Jamie Roche
Varuna De-Silva
Joosep Hook
Mirco Moencks
A.M. Kondoz

中文导读

提出一个多模态框架，融合RGB和激光雷达点云数据，使用R-CNN和3D改进Fisher向量网络进行人类活动识别，在自采数据集上达到90%准确率，可用于体育分析、社交行为理解、监控和自动驾驶。

Abstract

Increasingly, the task of detecting and recognizing the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture the required data to perform the task confidently. That being the case, range sensors, like light detection and ranging (LiDAR), can complement the process to perceive the environment more robustly. Most recently, researchers have been exploring ways to apply convolutional neural networks to 3-D data. These methods typically rely on a single modality and cannot draw on information from complementing sensor streams to improve accuracy. This article proposes a framework to tackle human activity recognition by leveraging the benefits of sensor fusion and multimodal machine learning. Given both RGB and point cloud data, our method describes the activities being performed by subjects using regions with a convolutional neural network (R-CNN) and a 3-D modified Fisher vector network. Evaluated on a custom captured multimodal dataset demonstrates that the model outputs remarkably accurate human activity classification (90%). Furthermore, this framework can be used for sports analytics, understanding social behavior, surveillance, and perhaps most notably by autonomous vehicles (AVs) to data-driven decision-making policies in urban areas and indoor environments.

计算机科学人类活动识别传感器融合卷积神经网络激光雷达

免费全文 ↗阅读原文 ↗