Skip to main navigation Skip to search Skip to main content

Muti-Modal Emotion Recognition via Hierarchical Knowledge Distillation

  • Teng Sun
  • , Yinwei Wei*
  • , Juntong Ni
  • , Zixin Liu
  • , Xuemeng Song
  • , Yaowei Wang
  • , Liqiang Nie
  • *Corresponding author for this work
  • Shandong University
  • Peng Cheng Laboratory
  • National University of Singapore
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Due to its wide applications, multimodal emotion recognition has gained increasing research attention. Although existing methods have achieved compelling success with various multimodal fusion methods, they overlook that the dominated modality (e.g., text) may cause a shortcut and hence negatively affect the representation learning of other modalities (e.g., image and audio). To alleviate such a problem, we resort to the knowledge distillation to narrow the gap between different modalities. In particular, we develop a new hierarchical knowledge distillation model for multi-modal emotion recognition (HKD-MER), consisting of three components, feature extraction, hierarchical knowledge distillation, and attentive multi-modal fusion. As the major contribution in our proposed model, the hierarchical knowledge distillation is designed to transfer the knowledge from the dominant modality to the others at both the feature and label levels. It boosts the performance of non-dominated modalities by modeling the inter-modal relation between different modalities. We have justified the effectiveness of our proposed model over two benchmark datasets.

Original languageEnglish
Pages (from-to)9036-9046
Number of pages11
JournalIEEE Transactions on Multimedia
Volume26
DOIs
StatePublished - 2024
Externally publishedYes

Keywords

  • Contrastive learning
  • emotion recognition
  • knowledge distillation
  • multi-modal representation learning

Fingerprint

Dive into the research topics of 'Muti-Modal Emotion Recognition via Hierarchical Knowledge Distillation'. Together they form a unique fingerprint.

Cite this