Deep neural networks for Speech Enhancement and Speech Recognition: a systematic review

Citation

Natarajan, Sureshkumar and Rahman Al-Haddad, Syed Abdul and Ahmad, Faisul Arif and Kamil, Raja and Hassan, Mohd Khair and Azrad, Syaril and Macleans, June Francis and Abdulhussain, Sadiq H. and Mahmmod, Basheera M. and Saparkhojayev, Nurbek and Dauitbayeva, Aigul (2025) Deep neural networks for Speech Enhancement and Speech Recognition: a systematic review. Ain Shams Engineering Journal, 16 (7). art. no. 103405. pp. 1-35. ISSN 2090-4479

Abstract

The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.

Download File

Text
124114.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (8MB)

Official URL or Download Paper: https://www.sciencedirect.com/science/article/pii/...

Additional Metadata

Item Type:	Article
Subject:	Engineering (all)
Divisions:	Faculty of Computer Science and Information Technology Faculty of Engineering
DOI Number:	https://doi.org/10.1016/j.asej.2025.103405
Publisher:	Ain Shams University
Keywords:	Acoustic modeling; Beamforming; Deep neural network; Denoising; Machine learning; Reverberation; Speech enhancement; Speech recognition; Systematic review
Depositing User:	Ms. Nur Faseha Mohd Kadim
Date Deposited:	07 Apr 2026 02:23
Last Modified:	07 Apr 2026 02:23
Altmetrics:	http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1016/j.asej.2025.103405
URI:	http://psasir.upm.edu.my/id/eprint/124114
Statistic Details:	View Download Statistic

Actions (login required)

View Item