Voice Conversion Approach through Feature Statistical Mapping
Nasr, Abdulbaset M. (2001) Voice Conversion Approach through Feature Statistical Mapping. Masters thesis, Universiti Putra Malaysia.
Over the past few decades the field of speech processing has undergone tremendous changes and grown to be important both theoretically and technologically. Great advances have already been made in a broad range of applications such as speech analysis and synthesis techniques, voice recognition, text to speech conversion and speech coding techniques to name a few. On the process of development of these applications, voice conversion (VC) technique has recently emerged as a new branch of speech synthesis dealing with the speaker identity. The basic idea behind VC is to modify one person's speech so that it is recognized as being uttered by another person. There are numerous applications of voice conversion technique. Examples include the personalization of text to speech (TTS) systems to reduce the need for a large speech database. It could also be used in the entertainment industry. VC technology could be used to dub movies more effectively by allowing the dubbing actor to speak with the voice of the original actor but in a different language. Voice conversion can also be used in the language translation applications to create the identity of a foreign speaker. This project proposes a simple parametric approach to VC through the use of the well-known speech analysis technique namely Linear Prediction (LP). LP is used as analysis tool to extract the most important acoustic parameters of a person's speech signal. These parameters are the pitch period, LP coefficients, the voicing decision and the speech signal energy. Then, the features of the source speaker are mapped to match those of the target speaker through the use of statistical mapping technique. To illustrate the feasibility of the proposed approach. a simple to use voice conversion software was developed. The program code was written in C++ and implemented using Microsoft Foundation C lass (MFC). The proposed scheme to the problem has shown satisfactory results, where the synthesized speech signal has come as c lose as possible to match that of a target speaker.
Repository Staff Only: Edit item detail