UPM Institutional Repository

A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform


Mahmmod, Basheera M. (2018) A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform. Doctoral thesis, Universiti Putra Malaysia.


Speech is considered the key mode of interaction amongst humans. Speech signals encounter different scenarios during transmission, such as interference and additive noise, which lead to generate noisy signals. Therefore, robust Speech Enhancement Algorithms (SEA) that suppress noise without distorting the original signals are necessary. The removing of noise without causing speech distortion is a challenging task. Moreover, an annoying noise that appears after the enhancement process, called Musical Noise (MN), should be eliminated. Recent SEA approaches tend to enhance speech quality and intelligibility, because improving these two attributes is critical for normal people and hearing impairments. Therefore, this thesis aims to restore speech signals from corrupted signal with minimum MN and best trade-off between Residual Noise (RN) and Signal Distortion (SD). First, a new transform based on new orthogonal polynomials, called the Discrete Krawtchouk–Tchebichef Transform (DKTT), is presented. DKTT exhibits superior compaction and localization properties that affect noise extraction process. Second, a noise classification method is adopted to identify the types of additive noise. Then, three optimum types of parameters are determined based on noise type. The subsequent phase of the developed system involves the proposed non-linear speech estimator. It is based on the Minimum Mean Square Error (MMSE) and the low-distortion approaches. The analytical solution is derived from the assumption that speech and noise components can be modeled based on a combination between Gamma and Laplacian distributions. These types of combination are used first in the developed SEA. Afterward, the second proposed linear estimator has been proposed mainly to reduce the effects of MN. Finally, the inverse of DKTT is applied to regain the clean signal back. To demonstrate the capability of the proposed system, clean speech sentences are selected from the TIMIT dataset. Moreover, eleven types of noise are chosen from the NOISEX-92 dataset, in addition to speech-shaped noises. These noises are the most dominate in the real world. Comparison results reinforce the improvement in quality and intelligibility measurements with reducing of MN level. The objective measurements are including Perceptual Evaluation of Speech Quality (PESQ), Frequency-Weighted Segmental Signal-to-Noise Ratio (FWSNR), the Coherence Speech Intelligibility Index (CSII), Short-Time Objective Intelligibility measure (STOI), along with three types of composite measures, namely, Signal distortion (SIG), Back-ground intrusiveness (BAK), and Overall quality (OVL). The improved SEA demonstrated an improvement in nearly all the aforementioned quality and intelligibility measures for different types of noise and five levels of signal-to-noise ratio (SNR), i.e., −10, −5, 0, 5, and 10 dB. In white noise, for example, the average absolute improvements and their corresponding percentage values of the system performance in terms of PESQ, OVL, STOI, and FWSNR in (dB) for the five SNR levels are 0.37 (17.3%), 0.37 (24.7%), 0.59 (7.8%), and 0.06 (7.7%), respectively. For cockpit noise, the improvements are 0.22 (10.6%), 0.18 (10.5%), 1.5 (23.3%), and 0.07 (9.5%), respectively. For Speech-Shaped noise, the improvements are 0.23 (11.3%), 0.17 (9.1%), 2.05 (31.6%), and 0.05 (7.8%), respectively. Moreover, the classification accuracy has been reached to 99.44%. This work contributed in developing a new transform, finding a new speech and noise models, introducing new linear and non-linear estimators with their adaptively smoothing parameter to get good noise reduction. As a conclusion, the proposed SEA enhances and improves noisy signals and regain clean signals with less RN and SD, reducing MN level. Moreover, best improvement in quality and intelligibility properties is obtained particularly in high noise levels.

Download File

FK 2018 137 - IR.pdf

Download (763kB) | Preview

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: Signal processing - Case studies
Subject: Speech processing systems
Call Number: FK 2018 137
Chairman Supervisor: Associate Professor Abd Rahman Ramli, PhD
Divisions: Faculty of Engineering
Depositing User: Mas Norain Hashim
Date Deposited: 20 Nov 2019 02:59
Last Modified: 20 Nov 2019 02:59
URI: http://psasir.upm.edu.my/id/eprint/75679
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item