Skip to main navigation Skip to search Skip to main content

A generative multimodal network for facial expression recognition

  • Yue Zhao
  • , Mingjian Song
  • , Qi Zhang*
  • , Jiawei Yang
  • , Kenji Yoshigoe
  • , Chunwei Tian
  • *Corresponding author for this work
  • Wenzhou-Kean University
  • Northwestern Polytechnical University Xian
  • School of Economics and Management, Harbin Institute of Technology Weihai
  • University of Turku
  • Embry-Riddle Aeronautical University

Research output: Contribution to journalArticlepeer-review

Abstract

Deep networks with strong feature extraction abilities have been extensively employed in facial expression recognition (FER). However, they focus on structural information from data dependency rather than facial attribute to limit robustness of obtained models for FER. In this paper, we propose a generative multimodal network (GMNet) for FER. Firstly, GMNet can generate and align multimodal face images, according to face asymmetry and mirror imaging principle. Secondly, it utilizes parallel networks to respectively learn diversity information based on original and generative multimodal face images and merge them from obtained multimodal face images to obtain reliable facial expression information. Thirdly, a sparse mechanism can further refine obtained richer facial features above to obtain more accurate facial expression information and reduce training costs. Finally, a cross loss can utilize cross domain restriction to guarantee reliability of multimodal face images to improve performance in facial expression. Experimental results show that our GMNet is superior to other popular FER methods. Codes of GMNet can be used at https://github.com/hellloxiaotian/GMNet.

Original languageEnglish
Article number113518
JournalPattern Recognition
Volume179
DOIs
StatePublished - Nov 2026
Externally publishedYes

Keywords

  • Cross-domain interaction
  • Facial expression recognition
  • Generative method
  • Multimodal technique

Fingerprint

Dive into the research topics of 'A generative multimodal network for facial expression recognition'. Together they form a unique fingerprint.

Cite this