Skip to main navigation Skip to search Skip to main content

CCAF: Coarse-to-fine Cross-Modal Alignment and Fusion for Multimodal Sentiment Analysis

  • Xianbing Zhao
  • , Shengzun Yang
  • , Buzhou Tang*
  • *Corresponding author for this work
  • Jiangnan University
  • Guangdong Provincial Key Laboratory of Intelligent Information Processing
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multimodal sentiment analysis (MSA) has witnessed remarkable advancements in recent years. Existing MSA methods focus primarily on learning coarse-grained representations from different modalities to perform global cross-modal alignment or fusion. However, these approaches often neglect fine-grained valuable sentimental clues derived from local cross-modal interactions. Furthermore, the cross-modal alignment and fusion of complex global and local cross-modal information pose significant challenges in MSA tasks. To address this issue, we propose a novel MSA framework that simultaneously captures coarse-grained and fine-grained cross-modal sentiment cues through global and local cross-modal alignment and fusion. Our approach consists of three key components: i) optimal transport-based global and local cross-modal alignment, which separately aligns valuable global and local sentiment clues across modalities, ii) global and local cross-modal gated attention, which respectively fuse the aligned global and local cross-modal representations, and iii) prototype-informed information bottleneck, which utilizes learnable sentiment prototypes and contrastive prototype match to eliminate redundant cross-modal information at both global and local levels. Extensive experiments conducted on two publicly available MSA datasets demonstrate the effectiveness and superiority of our proposed model.

Original languageEnglish
Title of host publicationWWW 2026 - Proceedings of the ACM Web Conference 2026
PublisherAssociation for Computing Machinery, Inc
Pages7421-7430
Number of pages10
ISBN (Electronic)9798400723070
DOIs
StatePublished - 12 Apr 2026
Externally publishedYes
Event35th ACM Web Conference, WWW 2026 - Dubai, United Arab Emirates
Duration: 29 Jun 20263 Jul 2026

Publication series

NameWWW 2026 - Proceedings of the ACM Web Conference 2026

Conference

Conference35th ACM Web Conference, WWW 2026
Country/TerritoryUnited Arab Emirates
CityDubai
Period29/06/263/07/26

Keywords

  • information bottleneck
  • multimodal sentiment analysis
  • prototype learning

Fingerprint

Dive into the research topics of 'CCAF: Coarse-to-fine Cross-Modal Alignment and Fusion for Multimodal Sentiment Analysis'. Together they form a unique fingerprint.

Cite this