Skip to main content

speech emotion recognition

this project boils down to

neural networks, voice processing, and drama

github stars github forks github last commit

tech stack

Python tensorflow pandas scikitlearn googlecolab

how it works

project image

next steps

  • Try other models (not necessarily neural networks).
  • Extract other audio features to see if they are better predictors than the MFCC.
  • Train on larger data sets, since 1500 files and only 200 samples per emotion is not enough.
  • Train on natural data, i.e. on recordings of people speaking in unstaged situations, so that the emotional speech sounds more realistic.
  • Train on more diverse data, i.e. on recordings of people of different cultures and languages. This is important because the expression of emotions varies across cultures and is influenced also by individual experiences.
  • Combine speech with facial expressions and text (speech-to-text) for multimodal sentiment analysis.