Abstract
The main research problem in multimodal sentiment analysis is to model inter-modality dynamics. However, most of the current work cannot consider enough in this aspect. In this study, we propose a multimodal fusion network MSA-AFN, which considers both multimodal relationships and differences in modal contributions to the task. Specifically, in the feature extraction process, we consider not only the relationship between audio and text, but also the contribution of temporal features to the task. In the process of multimodal fusion, based on the soft attention mechanism, the feature representation of each modality is weighted and connected according to their contribution to the task. We evaluate our proposed approach on the Chinese multimodal sentiment analysis dataset: CH-SIMS. Results show that our model achieves better results than comparison models. Moreover, the performance of some baselines has been improved by 0.28% to 9.5% after adding the component of our network.
| Original language | English |
|---|---|
| Pages (from-to) | 8207-8217 |
| Number of pages | 11 |
| Journal | Multimedia Tools and Applications |
| Volume | 83 |
| Issue number | 3 |
| DOIs | |
| State | Published - Jan 2024 |
Keywords
- Attention mechanism
- Multimodal fusion
- Multimodal sentiment analysis
Fingerprint
Dive into the research topics of 'Attention fusion network for multimodal sentiment analysis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver