TY - GEN
T1 - CCAF
T2 - 35th ACM Web Conference, WWW 2026
AU - Zhao, Xianbing
AU - Yang, Shengzun
AU - Tang, Buzhou
N1 - Publisher Copyright:
© 2026 Owner/Author.
PY - 2026/4/12
Y1 - 2026/4/12
N2 - Multimodal sentiment analysis (MSA) has witnessed remarkable advancements in recent years. Existing MSA methods focus primarily on learning coarse-grained representations from different modalities to perform global cross-modal alignment or fusion. However, these approaches often neglect fine-grained valuable sentimental clues derived from local cross-modal interactions. Furthermore, the cross-modal alignment and fusion of complex global and local cross-modal information pose significant challenges in MSA tasks. To address this issue, we propose a novel MSA framework that simultaneously captures coarse-grained and fine-grained cross-modal sentiment cues through global and local cross-modal alignment and fusion. Our approach consists of three key components: i) optimal transport-based global and local cross-modal alignment, which separately aligns valuable global and local sentiment clues across modalities, ii) global and local cross-modal gated attention, which respectively fuse the aligned global and local cross-modal representations, and iii) prototype-informed information bottleneck, which utilizes learnable sentiment prototypes and contrastive prototype match to eliminate redundant cross-modal information at both global and local levels. Extensive experiments conducted on two publicly available MSA datasets demonstrate the effectiveness and superiority of our proposed model.
AB - Multimodal sentiment analysis (MSA) has witnessed remarkable advancements in recent years. Existing MSA methods focus primarily on learning coarse-grained representations from different modalities to perform global cross-modal alignment or fusion. However, these approaches often neglect fine-grained valuable sentimental clues derived from local cross-modal interactions. Furthermore, the cross-modal alignment and fusion of complex global and local cross-modal information pose significant challenges in MSA tasks. To address this issue, we propose a novel MSA framework that simultaneously captures coarse-grained and fine-grained cross-modal sentiment cues through global and local cross-modal alignment and fusion. Our approach consists of three key components: i) optimal transport-based global and local cross-modal alignment, which separately aligns valuable global and local sentiment clues across modalities, ii) global and local cross-modal gated attention, which respectively fuse the aligned global and local cross-modal representations, and iii) prototype-informed information bottleneck, which utilizes learnable sentiment prototypes and contrastive prototype match to eliminate redundant cross-modal information at both global and local levels. Extensive experiments conducted on two publicly available MSA datasets demonstrate the effectiveness and superiority of our proposed model.
KW - information bottleneck
KW - multimodal sentiment analysis
KW - prototype learning
UR - https://www.scopus.com/pages/publications/105038550550
U2 - 10.1145/3774904.3792569
DO - 10.1145/3774904.3792569
M3 - 会议稿件
AN - SCOPUS:105038550550
T3 - WWW 2026 - Proceedings of the ACM Web Conference 2026
SP - 7421
EP - 7430
BT - WWW 2026 - Proceedings of the ACM Web Conference 2026
PB - Association for Computing Machinery, Inc
Y2 - 29 June 2026 through 3 July 2026
ER -