Acoustic scene classification using clustering-based acoustic models

Sáez, Natalia Ibáñez

This thesis consists in the extension of the baseline system for Acoustic Scene Classification, developed by the Audio Research Group at Tampere university of Technology for the challenge of Detection and Classification of Acoustic Scenes and Events (DCASE). The baseline is based on a supervised classification approach which is composed by training and testing stages. The training stage is based on the construction of a statistical model capable to describe each of the environmental classes that will be used during the training stage. The innovation part has the goal of clustering the available observations so that each class is divided into some subclasses. The models will be created for each subclass. These models describe acoustic environments in more detail, which allows achieving higher level of accuracy. The system has preserved its previous stages and the method used for the clustering has been k-means. The experiments have been performed firstly with the development dataset and the results obtained have been validated with the challenge dataset aiming to verify that the system is capable to generalize its results. Three different approaches have been tested: First, the number of clusters has been set invariant for all the classes. Values 2, 3, 5 and 10 have been tested. The performance has increased 2% for 2 clusters. Second, the number of clusters has been selected manually choosing the values that proved to provide better performance for each class during the development stage. The performance has increased 2.3% with respect to the baseline. Third approach is more sophisticated and includes cluster evaluation based on BD and CH indices. This method allows calculating the number of clusters for each class automatically. It has improved the performance in 2% with respect to the baseline.

Research areas