TY - GEN
T1 - Local-Global Cross-Fusion Transformer Network for Facial Expression Recognition
AU - Liu, Yicheng
AU - Li, Zecheng
AU - Zhang, Yanbo
AU - Wen, Jie
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - Facial Expression Recognition (FER) has received increasing attention in the computer vision community. For FER, there are two challenging issues among the facial images: large inter-class similarity and small intra-class discrepancy. To address these challenges and obtain a better performance, we propose a Local-Global Cross-Fusion Transformer network in this paper. Specifically, the method seeks to obtain a more discriminative facial representation by sufficiently considering the local features of multiple local regions of the face and global face features. In order to extract the critical local area features of the face, a local feature decomposition module based on facial landmarks is designed. In addition, a local-global cross-fusion Transformer is designed to enhance the synergistic correlation between local features and global features using the cross-attention mechanism, which can maximize the focus on key regions while considering the connection information among local regions. Extensive experiments conducted on three mainstream expression recognition datasets, RAF-DB, FERPlus, and AffectNet, show that the method outperforms many existing expression recognition methods and can significantly improve the accuracy of expression recognition.
AB - Facial Expression Recognition (FER) has received increasing attention in the computer vision community. For FER, there are two challenging issues among the facial images: large inter-class similarity and small intra-class discrepancy. To address these challenges and obtain a better performance, we propose a Local-Global Cross-Fusion Transformer network in this paper. Specifically, the method seeks to obtain a more discriminative facial representation by sufficiently considering the local features of multiple local regions of the face and global face features. In order to extract the critical local area features of the face, a local feature decomposition module based on facial landmarks is designed. In addition, a local-global cross-fusion Transformer is designed to enhance the synergistic correlation between local features and global features using the cross-attention mechanism, which can maximize the focus on key regions while considering the connection information among local regions. Extensive experiments conducted on three mainstream expression recognition datasets, RAF-DB, FERPlus, and AffectNet, show that the method outperforms many existing expression recognition methods and can significantly improve the accuracy of expression recognition.
KW - cross-attention mechanism
KW - facial expression recognition
KW - facial landmark
KW - local and global facial features
UR - https://www.scopus.com/pages/publications/85192737236
U2 - 10.1007/978-981-97-2390-4_18
DO - 10.1007/978-981-97-2390-4_18
M3 - 会议稿件
AN - SCOPUS:85192737236
SN - 9789819723898
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 254
EP - 269
BT - Web and Big Data - 7th International Joint Conference, APWeb-WAIM 2023, Proceedings
A2 - Song, Xiangyu
A2 - Feng, Ruyi
A2 - Chen, Yunliang
A2 - Li, Jianxin
A2 - Min, Geyong
PB - Springer Science and Business Media Deutschland GmbH
T2 - 7th Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, APWeb-WAIM 2023
Y2 - 6 October 2023 through 8 October 2023
ER -