Skip to main navigation Skip to search Skip to main content

Multi-task mutual learning for multimodal emotion-cause pair extraction in conversations

  • Geng Tu
  • , Jun Wang
  • , Li Yang
  • , Bin Liang
  • , Erik Cambria
  • , Wenjie Li
  • , Ruifeng Xu*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Chinese University of Hong Kong
  • Nanyang Technological University
  • Hong Kong Polytechnic University
  • Peng Cheng Laboratory
  • Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal Emotion-Cause Pair Extraction (ECPE) in Conversations (MC-ECPE) aims to simultaneously identify emotions and their causes within conversations across different modalities. Early paradigms of ECPE involved a two-step framework for emotion and cause extraction and pairing, resulting in error accumulation. Thus, there is a growing interest in end-to-end ECPE. Despite the progress, emotion, and cause extraction are essentially mutually dependent, yet existing efforts fail to model it deeply. Additionally, Baselines for the MC-ECPE task primarily use traditional fusion methods like concatenation, limiting context understanding between modalities. Based on this, we propose the Multi-Task Mutual Learning (MTML) framework, which utilizes implicit and explicit modeling strategies to model the mutual dependency between emotion and cause extraction. Specifically, we introduce a Multimodal Interactive Graph Attention Network (MIGAT) with three types of connections: intra-modal for conversational context, cross-modal for multimodal fusion, and cross-task for capturing dependencies between emotion and cause extraction tasks. Implicit modeling leverages cross-task connections within MIGAT and information from shared components in progressive multi-task learning (PMTL), while explicit modeling iteratively extracts emotional and causal probability distributions to enhance subsequent reasoning. Experimental results demonstrate the superiority of our MTML over state-of-the-art methods.

Original languageEnglish
Article number103877
JournalInformation Fusion
Volume127
DOIs
StatePublished - Mar 2026
Externally publishedYes

Keywords

  • Graph attention network
  • Multi-task mutual learning
  • Multimodal emotion-cause pair extraction in conversations

Fingerprint

Dive into the research topics of 'Multi-task mutual learning for multimodal emotion-cause pair extraction in conversations'. Together they form a unique fingerprint.

Cite this