speech emotion recognition
this project boils down to
neural networks, voice processing, and drama
tech stack
how it works
next steps
- Try other models (not necessarily neural networks).
- Extract other audio features to see if they are better predictors than the MFCC.
- Train on larger data sets, since 1500 files and only 200 samples per emotion is not enough.
- Train on natural data, i.e. on recordings of people speaking in unstaged situations, so that the emotional speech sounds more realistic.
- Train on more diverse data, i.e. on recordings of people of different cultures and languages. This is important because the expression of emotions varies across cultures and is influenced also by individual experiences.
- Combine speech with facial expressions and text (speech-to-text) for multimodal sentiment analysis.