Skip to main navigation Skip to search Skip to main content

From Feature Alignment to Multimodal Fusion: A Two-Stage Primary Modality-Guided Approach for MSA

  • Guoyu Ma
  • , Xiaoqiang Ren
  • , Yan Jiang*
  • , Hongjiao Guan
  • , Bing Xu
  • *Corresponding author for this work
  • Qilu University of Technology
  • Shandong Fundamental Research Center for Computer Science
  • Faculty of Computing, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multimodal Sentiment Analysis (MSA) aims to leverage heterogeneous data-typically language, vision, and acoustic modalities-to accurately interpret human emotional states. Despite recent advances, challenges persist due to the feature distribution difference caused by intrinsic modality heterogeneity. Prior works either neglect the contribution disparity among modalities, especially the dominant role of language in sentiment reasoning, or emphasize language dominance in fusion-space alignment, ignoring coordination in the early feature space. To address these limitations, we propose a novel Two-Stage Primary Modality-Guided (TSPMG) framework, which introduces primary-modality supervision into both feature-space distribution alignment and fusion-space attention modulation. This dual-level cooperative mechanism progressively amplifies the dominant modality's influence throughout the entire representation learning pipeline. Extensive experiments on two benchmark datasets demonstrate that TSPMG achieves superior or comparable results to state-of-the-art baselines, with ablation studies further validating the effectiveness of primary-modality-guided strategies for robust and interpretable multimodal sentiment analysis. The code is available at https://github.com/Kaisa777/TSPMG.

Original languageEnglish
Title of host publicationProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
EditorsTat-Seng Chua, Lai-Kuan Wong, Chee Seng Chan, Jinhui Tang, Chong-Wah Ngo, Klaus Schoeffmann, Jiaying Liu, Yo-Sung Ho
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400720055
DOIs
StatePublished - 6 Dec 2025
Externally publishedYes
Event7th ACM International Conference on Multimedia in Asia, MMAsia 2025 - Kuala Lumpur, Malaysia
Duration: 9 Dec 202512 Dec 2025

Publication series

NameProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025

Conference

Conference7th ACM International Conference on Multimedia in Asia, MMAsia 2025
Country/TerritoryMalaysia
CityKuala Lumpur
Period9/12/2512/12/25

Keywords

  • Feature Alignment
  • Multimodal Fusion
  • Representation Learning
  • Sentiment Analysis

Fingerprint

Dive into the research topics of 'From Feature Alignment to Multimodal Fusion: A Two-Stage Primary Modality-Guided Approach for MSA'. Together they form a unique fingerprint.

Cite this