UPM Institutional Repository

Integration of CNN and LSTM networks for behavior feature recognition: an analysis


Citation

Aris, Teh Noranis Mohd and Ningning, Chen and Mustapha, Norwati and Zolkepli, Maslina (2024) Integration of CNN and LSTM networks for behavior feature recognition: an analysis. International Journal on Advanced Science, Engineering and Information Technology, 14 (5). pp. 1793-1799. ISSN 2460-6952; eISSN: 2088-5334

Abstract

This study explores an integration model combining convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) for behavior feature recognition. Initially, a straightforward three-dimensional deep CNN structure was introduced for behavior recognition, capturing static and dynamic characteristics, and analyzing the network's convergence speed. Subsequent experiments utilize the VGG16 CNN model, substituting the fully connected layer with global average pooling. Then, a comparative experiment was conducted on the MSRC-12 behavior dataset between the models. Due to the complexity of LSTM, a simpler GRU model with similar effectiveness was used for comparison. The experimental results showed that the GRU-CNN model performed best, outperforming other algorithms in the literature on the same dataset. Under the same experimental parameters, the GRU-CNN model converges significantly faster than the LSTM-CNN model, with speedier training speed. In addition, the best accuracy is achieved by adjusting the dropout and epoch. Due to cross-validation in this study, the GRU-CNN models achieved good experimental results when the hidden node dropout rate was 0.5. The epoch size had negligible impact on the GRU-CNN model. Still, the accuracy of the CNN and CNN-GRU models increased significantly with more epochs, further validating the effectiveness of the GRU-CNN model. These experiments also indicate that convolutional neural networks based on deep learning are superior to traditional machine learning methods for human behavior recognition. Using depth images instead of conventional images allows for better extraction of spatial features, and the integration with long short-term memory networks enhances the extraction of temporal features from sequences.


Download File

[img] Text
117497.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.18517/ijaseit.14.5.10116
Publisher: Insight Society
Keywords: Behavior feature recognition; CNN; GRU.; LSTM
Depositing User: Ms. Nuraida Ibrahim
Date Deposited: 27 May 2025 08:47
Last Modified: 27 May 2025 08:47
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.18517/ijaseit.14.5.10116
URI: http://psasir.upm.edu.my/id/eprint/117497
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item