Automatic Conversion of Emotions in Speech within a Speaker Independent Framework

Lou, Samuel Navarro

Emotions in speech are a fundamental part of a natural dialog. In everyday life, vocal interaction with people often implies emotions as an intrinsic part of the conversation to a greater or lesser extent. Thus, the inclusion of emotions in human-machine dialog systems is crucial to achieve an acceptable degree of naturalness in the communication. This thesis focuses on automatic emotion conversion of speech, a technique whose aim is to transform an utterance produced in neutral style to a certain emotion state in a speaker independent context. Conversion of emotions represents a challenge in the sense that emotions a affect significantly all the parts of the human vocal production system, and in the conversion process all these factors must be taken into account carefully. The techniques used in the literature are based on voice conversion approaches, with minor modifications to create the sensation of emotion. In this thesis, the idea of voice conversion systems is used as well, but the usual regression process is divided in a two-step procedure that provides additional speaker normalization to remove the intrinsic speaker dependency of this kind of systems, using vocal tract length normalization as a pre-processing technique. In addition, a new method to convert the duration trend of the utterance and the intonation contour is proposed, taking into account the contextual information.

Research areas