Skip to main navigation Skip to search Skip to main content

Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors

  • Yang Wu
  • , Yanyan Zhao*
  • , Hao Yang
  • , Song Chen
  • , Bing Qin
  • , Xiaohuan Cao
  • , Wenting Zhao
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • AI Lab of China Merchants Bank

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multimodal sentiment analysis has attracted increasing attention and lots of models have been proposed. However, the performance of the state-of-the-art models decreases sharply when they are deployed in the real world. We find that the main reason is that real-world applications can only access the text outputs by the automatic speech recognition (ASR) models, which may be with errors because of the limitation of model capacity. Through further analysis of the ASR outputs, we find that in some cases the sentiment words, the key sentiment elements in the textual modality, are recognized as other words, which makes the sentiment of the text change and hurts the performance of multimodal sentiment analysis models directly. To address this problem, we propose the sentiment word aware multimodal refinement model (SWRM), which can dynamically refine the erroneous sentiment words by leveraging multimodal sentiment clues. Specifically, we first use the sentiment word position detection module to obtain the most possible position of the sentiment word in the text and then utilize the multimodal sentiment word refinement module to dynamically refine the sentiment word embeddings. The refined embeddings are taken as the textual inputs of the multimodal feature fusion module to predict the sentiment labels. We conduct extensive experiments on the real-world datasets including MOSI-Speechbrain, MOSI-IBM, and MOSI-iFlytek and the results demonstrate the effectiveness of our model, which surpasses the current state-of-the-art models on three datasets. Furthermore, our approach can be adapted for other multimodal feature fusion models easily.

Original languageEnglish
Title of host publicationACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022
EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
PublisherAssociation for Computational Linguistics (ACL)
Pages1397-1406
Number of pages10
ISBN (Electronic)9781955917254
DOIs
StatePublished - 2022
EventFindings of the Association for Computational Linguistics: ACL 2022 - Dublin, Ireland
Duration: 22 May 202227 May 2022

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the Association for Computational Linguistics: ACL 2022
Country/TerritoryIreland
CityDublin
Period22/05/2227/05/22

Fingerprint

Dive into the research topics of 'Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors'. Together they form a unique fingerprint.

Cite this