Abstract
Recently, emotion recognition from real-life speech is so challenging that much attention has been paid to it. In light of this, we develop our research on spontaneous speech emotion estimation at following two levels. At theoretic level, we adopt the two-dimensional Valence-arousal emotion plane to describe the real-life emotions, instead of the traditional discrete representation. Benefiting from this continuous perspective, plentiful emotions of spontaneous speech can be represented tractably. At implemental level, a small-scaled spontaneous corpus with 777 utterances is established firstly. Then, to estimate the continuous-valued emotions from speech, three regression algorithms are adopted as the estimators. Experimental results show that Elman Recurrent Neural Network presents better performance than Fuzzy k-Nearest Neighbor and Support Vector Regression, and suits better for emotion estimation task, yielding smallest mean square errors and highest R-Square, reaching 80.84% for valence and 85.64% for arousal respectively.
| Original language | English |
|---|---|
| Pages (from-to) | 308-316 |
| Number of pages | 9 |
| Journal | Journal of Convergence Information Technology |
| Volume | 6 |
| Issue number | 6 |
| DOIs | |
| State | Published - Jun 2011 |
| Externally published | Yes |
Keywords
- Emotion Recognition
- Real-life Speech
- Valence-arousal Emotion Space
Fingerprint
Dive into the research topics of 'Estimating continuous-valued emotion of real-life speech'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver