Citation
Vijay, Harshil and Mathur, Shikar and Agarwal, Shashwat and Perumal, Thinagaran and Sharma, Abhishek
(2026)
Motionscope AI: comprehensive human activity recognition through integrated pose analysis and temporal modeling.
IEEE Sensors Journal, 26 (7).
pp. 10872-10882.
ISSN 1530-437X; eISSN: 1558-1748
Abstract
This article introduces a practical deep learning framework for recognizing human activities in indoor environments using MediaPipe pose estimation with multibranch bidirectional LSTM (BiLSTM) architecture. We extract comprehensive features - including 3-D pose landmarks, hand gestures, velocity, acceleration, and joint angles - resulting in 685-D vectors per frame. Our multibranch design processes each feature type through specialized BiLSTM pathways with attention mechanisms, enabling the model to learn distinct spatial, temporal, and structural patterns. To ensure robust performance in real-world scenarios, we incorporate label smoothing, gradient clipping, and adaptive learning strategies. Evaluated on the IndoorActionDataset with eight activity classes, our approach achieves 95.6% test accuracy, significantly outperforming OpenPose-based methods (87.2%) and 3D-CNN with Transformer architectures (75.1%). With only 23 ms inference latency, the system demonstrates practical viability for real-time deployment on resource-constrained devices. The results confirm that thoughtful feature engineering combined with attention-driven temporal modeling can deliver both accuracy and efficiency for activity recognition tasks.
Download File
Additional Metadata
Actions (login required)
 |
View Item |