Fundamentals of Speech Processing: - Dr. Ghulam Muhammad


Speech Processing: (Speech Recognition and Speech Synthesis)


Block diagram of speech recognition and speech synthesis

Figure 1: Block diagram of speech recognition and speech processing. (ASR - Automatic Speech Recognition; TTS - Text To Speech.)


Speech is the most natural means of human-to-human communication. Fifty years ago, Arthur C. Clarke portrayed a machine, HAL 2000, which became much popular for its ability to listen and speak to human. In an age when ENIAC (Electronic Numerical Integrator And Computer, invented by John W. Mauchly and John P. Eckert in 1946), the first computer, had just been invented and programming a computer required special skills, the popularity of HAL 2000 illustrated the universal appeal of speech as an interface to machine. Today, in the year of 2006, computers are ubiquitous and are managing an increasingly large number of tasks that range from mundane to critical. 

The recent advances in computer performances coupled with efforts in improving the speech recognition technology enable the development of practical applications. However, a fully automatic speech-based interface to products, which would encompass real-time speech processing as well as language understanding, is still considered to be many years away.

      Automatic Speech Recognition (ASR) refers to the problem of extracting automatically a transcription of the linguistic content of an acoustical speech signal. Current ASR systems perform acceptably in controlled environments. The performance is good enough to be deployed in commercial products. However, when used in “noisy conditions”, their performance deteriorates rapidly to a point where they are unusable to practice and are far behind human performance. The reduction of the performance of ASR systems results from the mismatch that exists between training and testing conditions. In the research on ASR, we try to reduce this kind of mismatch for improving accuracy.

     On the other hand, Text-To-Speech (TTS) or speech synthesis refers to the problem of reconstructing speech waveform from a text of speech or speech feature vectors. The main problem lies in missing phase information or speakers' characteristics. We have to cope with these difficulties in the research of speech synthesis.


