Voice Activity Detection in Noise Robust Speech Recognition

Pasanen, Antti

In this thesis, Voice Activity Detection (VAD) algorithms are integrated in an ASR system. VAD is assumed to give additional information to the ASR system about the presence of speech, thus increasing the robustness of the ASR system. Two standard VAD algorithms (G.729b and GSM) are described and a statistical Gaussian Mixture Model (GMM) based VAD is introduced. For the GMM based VAD, different adaptation techniques are employed to track the changing background noise statistics. The VAD algorithms are integrated with the ASR system, with explicit and implicit approaches. The explicit approach means that the VAD is a separate module in the front end of the ASR system, while in the implicit approach the VAD decision is included in the decoding stage of the speech recognition unit. The performance of the VAD algorithms are compared directly using frame classification rates and indirectly using recognition rates. Recognition is performed as a small vocabulary isolated word recognition task with a Hidden Model based ASR system using normalized Mel-frequency cepstral coefficients. According to our simulations, ideal information about the word boundaries increases significantly the recognition accuracy of the ASR system. However, the described VAD algorithms were not able to increase the recognition accuracy significantly.