TY - GEN
T1 - Multimodal Blockwise Transformer for Robust Sentiment Recognition
AU - Lai, Zhengqin
AU - Hong, Xiaopeng
AU - Wang, Yabin
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/10/28
Y1 - 2024/10/28
N2 - The MER-NOISE challenges participants to classify emotions from multimodal data, specifically audio and visual, with added noise. In this paper, we present a solution for the NOISE track of the MER2024 competition, which focuses on the robustness of emotion recognition in noisy environments. We propose a novel multimodal Blockwise Transformer (MBT) architecture, which effectively integrates visual, auditory, and textual features to improve emotion classification accuracy. Our approach includes several key innovations: the MBT network structure, the TIE module for weighted encoder input, and the momentum contrast. Additionally, we employed diverse data augmentation methods, both conventional and novel, and introduced a confidence-based decision-level fusion strategy to enhance model performance. In the MER2024 NOISE track, our solution achieved a Weighted Average F-score (WAF) of 0.8365, securing third place. This result demonstrates the effectiveness and robustness of our approach in handling noisy data for emotion recognition tasks.
AB - The MER-NOISE challenges participants to classify emotions from multimodal data, specifically audio and visual, with added noise. In this paper, we present a solution for the NOISE track of the MER2024 competition, which focuses on the robustness of emotion recognition in noisy environments. We propose a novel multimodal Blockwise Transformer (MBT) architecture, which effectively integrates visual, auditory, and textual features to improve emotion classification accuracy. Our approach includes several key innovations: the MBT network structure, the TIE module for weighted encoder input, and the momentum contrast. Additionally, we employed diverse data augmentation methods, both conventional and novel, and introduced a confidence-based decision-level fusion strategy to enhance model performance. In the MER2024 NOISE track, our solution achieved a Weighted Average F-score (WAF) of 0.8365, securing third place. This result demonstrates the effectiveness and robustness of our approach in handling noisy data for emotion recognition tasks.
KW - modality robustness
KW - multimodal fusion
KW - multimodal sentiment analysis
UR - https://www.scopus.com/pages/publications/85210809362
U2 - 10.1145/3689092.3689399
DO - 10.1145/3689092.3689399
M3 - 会议稿件
AN - SCOPUS:85210809362
T3 - MRAC 2024 - Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing
SP - 88
EP - 92
BT - MRAC 2024 - Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing
PB - Association for Computing Machinery, Inc
T2 - 2nd International Workshop on Multimodal and Responsible Affective Computing, MRAC 2024
Y2 - 28 October 2024 through 1 November 2024
ER -