Citation
Husin, Nor Azura and Yaakob, Razali and Mustapha, Norwati and Ejaz, Muhammad Mudassir and Kamaruzaman, Nurul Nadhrah
(2024)
Smote-2DCNN for enhancing speech emotion recognition.
Journal of Theoretical and Applied Information Technology, 102 (13).
pp. 5079-5092.
ISSN 1992-8645; eISSN: 1817-3195
Abstract
Speech emotion recognition (SER) is a specialized form of audio classification that aims to identify and classify emotional states expressed from spoken language or speech signals. In this study, the main objective is to propose an accurate audio classification model for the SER. This study primarily focuses on two key issues: the insufficient training data within each available dataset and the imbalanced distribution of data, both of which contribute to overfitting and negatively impact the accuracy of the audio classification model. Henceforth, we present the SMOTE-2DCNN, which is a combination of the Synthetic Minority Oversampling Technique (SMOTE) with a 2-Dimensional Convolutional Neural Network (2DCNN), designed to effectively address imbalanced data distributions and achieve accurate emotion classification. Our proposed SMOTE-2DCNN demonstrates outstanding performance with a UA rate of 81% and a WA rate of 80%. This represents a substantial enhancement, achieving approximately 15% higher accuracy compared to the leading state-of-the-art method.
Download File
Additional Metadata
Actions (login required)
 |
View Item |