In this talk, I will present recent findings on the role of Artificial General Intelligence (AGI) techniques in improving the performance of emotion and engagement recognition systems. The presentation will cover experimental studies from my recent publications, focusing on the characterization of embarrassment through acoustic modeling, as well as emotion modeling in remote learning environments.
In both cases, the ecological nature of the data collection pipelines led to significant data sparsity issues. Additionally, the speech data was recorded in both High German and Swiss German. As a result, the acoustic and phonetic spaces were highly diverse due to substantial linguistic differences between these varieties.
Beyond data sparsity in emotional and engagement-related cues, we also observed strong subject-specific patterns. Consequently, in speaker-independent modeling scenarios, it is essential to select the most discriminative acoustic feature representations along with appropriate modeling techniques.
To address these challenges, we combined knowledge-based acoustic features with representations extracted from pretrained models commonly used in AGI-oriented speech processing. In this talk, I will also provide details on the data collection process and acoustic modeling approaches.