Skip to main navigation Skip to search Skip to main content

Multi-modal Emotion Recognition Based on Deep Learning in Speech, Video and Text

  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Emotions are a concrete manifestation of human communication, and the research on emotion recognition has gradually increased. Recently, researchers have attached great importance to multi-modal emotion recognition, and in the field of speech, video, text and physiological signal emotion recognition, a lot of research work has been carried out. Multimodal emotion recognition complements each other by fusing information between different modalities, thereby improving the final recognition rate. This paper preprocesses the three modes of speech, video and text of the IEMOCAP dataset, uses deep learning neural networks to extract emotional features, and performs information fusion at the feature layer. There are five types of emotions: angry, excited, sad, neutral and happy. From the results, the accuracy of the three-mode emotion recognition model of the training set is 0.9541, and that of the verification set is 0.68383. Compared to speech emotion recognition improved by 0.11751.

Original languageEnglish
Title of host publication2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages328-333
Number of pages6
ISBN (Electronic)9781728168968
DOIs
StatePublished - 23 Oct 2020
Externally publishedYes
Event5th IEEE International Conference on Signal and Image Processing, ICSIP 2020 - Virtual, Nanjing, China
Duration: 23 Oct 202025 Oct 2020

Publication series

Name2020 IEEE 5th International Conference on Signal and Image Processing, ICSIP 2020

Conference

Conference5th IEEE International Conference on Signal and Image Processing, ICSIP 2020
Country/TerritoryChina
CityVirtual, Nanjing
Period23/10/2025/10/20

Keywords

  • Multimodal emotion recognition
  • deep learning
  • feature level fusion

Fingerprint

Dive into the research topics of 'Multi-modal Emotion Recognition Based on Deep Learning in Speech, Video and Text'. Together they form a unique fingerprint.

Cite this