TY - GEN
T1 - MAG+
T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
AU - Zhao, Xianbing
AU - Chen, Yixin
AU - Li, Wanting
AU - Gao, Lei
AU - Tang, Buzhou
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Human multimodal sentiment analysis is a challenging task that devotes to extract and integrate information from multiple resources, such as language, acoustic and visual information. Recently, multimodal adaptation gate (MAG), an attachment to transformer-based pre-trained language representation models, such as BERT and XLNet, has shown state-of-the-art performance on multimodal sentiment analysis. MAG only uses a 1-layer network to fuse multimodal information directly, and does not pay attention to relationships among different modalities. In this paper, we propose an extended MAG, called MAG+, to reinforce multimodal fusion. MAG+ contains two modules: multi-layer MAGs with modality reinforcement (M3R) and Adaptive Layer Aggregation (ALA). In the MAG with modality reinforcement of M3R, each modality is reinforced by all other modalities via crossmodal attention at first, and then all modalities are fused via MAG. The ALA module leverages the multimodal representations at low and high levels as the final multimodal representation. Similar to MAG, MAG+ is also attached to BERT and XLNet. Experimental results on two widely used datasets demonstrate the efficacy of our proposed MAG+.
AB - Human multimodal sentiment analysis is a challenging task that devotes to extract and integrate information from multiple resources, such as language, acoustic and visual information. Recently, multimodal adaptation gate (MAG), an attachment to transformer-based pre-trained language representation models, such as BERT and XLNet, has shown state-of-the-art performance on multimodal sentiment analysis. MAG only uses a 1-layer network to fuse multimodal information directly, and does not pay attention to relationships among different modalities. In this paper, we propose an extended MAG, called MAG+, to reinforce multimodal fusion. MAG+ contains two modules: multi-layer MAGs with modality reinforcement (M3R) and Adaptive Layer Aggregation (ALA). In the MAG with modality reinforcement of M3R, each modality is reinforced by all other modalities via crossmodal attention at first, and then all modalities are fused via MAG. The ALA module leverages the multimodal representations at low and high levels as the final multimodal representation. Similar to MAG, MAG+ is also attached to BERT and XLNet. Experimental results on two widely used datasets demonstrate the efficacy of our proposed MAG+.
KW - BERT
KW - Multimodal Fusion
KW - Multimodal Sentiment Analysis
UR - https://www.scopus.com/pages/publications/85131265928
U2 - 10.1109/ICASSP43922.2022.9746536
DO - 10.1109/ICASSP43922.2022.9746536
M3 - 会议稿件
AN - SCOPUS:85131265928
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4753
EP - 4757
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 May 2022 through 27 May 2022
ER -