Voice activity detection in the presence of breathing noise

Myllymäki, Mikko

Voice activity detection stands for the process of recognizing speech segments from an input signal consisting of speech, pauses in the speech, silence, breathing and acoustic interference. Voice activity detection algorithm is an important part of many communication devices, such as mobile phones, because they can be used for example to reduce battery consumption and bandwidth usage. However, the communication devices and also the circumstances in which they are used vary greatly, and thus there does not exist one such voice activity detection algorithm that could be used in every case effectively but the algorithm has to be developed specifically for the problem at hand. In the thesis a voice activity detection algorithm was developed to be used in circumstances, where a very high-level breathing sound is present in the signal. Because the property is unique when compared to previous studies, previously developed voice activity detection algorithms could not be used. Instead, a new voice activity detection algorithm that constitutes of framewise feature extraction, classification of the features and postprocessing was developed. This was done by testing many different options for the parts of the voice activity detection algorithm, evaluating systematically their contribution to the results of the detection and selecting the best combination of parts as the final voice activity detection algorithm. The final voice activity detection algorithm constitutes of Mel-frequency band energies as the features, neural network as the classifier and hidden Markov model as the postprocessing method. All the different options of the algorithm parts and the results obtained with different algorithms were presented in the thesis.