speech emotion recognition

neural networks, voice processing, and drama

tech stack

project image

Try other models (not necessarily neural networks).
Extract other audio features to see if they are better predictors than the MFCC.
Train on larger data sets, since 1500 files and only 200 samples per emotion is not enough.
Train on natural data, i.e. on recordings of people speaking in unstaged situations, so that the emotional speech sounds more realistic.
Train on more diverse data, i.e. on recordings of people of different cultures and languages. This is important because the expression of emotions varies across cultures and is influenced also by individual experiences.
Combine speech with facial expressions and text (speech-to-text) for multimodal sentiment analysis.