Skip to main navigation Skip to search Skip to main content

Multimodal Local Global Interaction Networks for Automatic Depression Severity Estimation

  • Mingyue Niu
  • , Zhuhong Shao
  • , Yongjun He
  • , Jianhua Tao*
  • , Björn W. Schuller
  • *Corresponding author for this work
  • Yanshan University
  • Capital Normal University
  • Beijing Engineering Research Center of Highly Reliable Embedded System
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Tsinghua University
  • University of Chinese Academy of Sciences
  • Augsburg University
  • Imperial College London

Research output: Contribution to journalArticlepeer-review

Abstract

Physiological studies have shown that differences between depressed and healthy individuals are manifested in the audio and video modalities. Hence, some researchers have combined local and global information from audio or video modality to obtain the unimodal representation. Attention mechanisms or Multi-Layer Perceptrons (MLPs) are then used to complete the fusion of different representations. However, attention mechanisms or MLPs is essentially a linear aggregation manner, and lacks the ability to explore the element-wise interaction between local and global representations within and across modalities, which affects the accuracy of estimating the depression severity. To this end, we propose a Representation Interaction (RI) module, which uses the mutual linear adjustment to achieve element-wise interaction between representations. Thus, the RI module can be seen as an mutual observation of two representations, which helps to achieve complementary advantages and improve the model’s ability to characterize depression cues. Furthermore, since the interaction process generates multiple representations, we propose a Multi-representation Prediction (MP) module. This module implements multi-representation vectorization in a hierarchical manner from summarizing a single representation to aggregating multiple representations, and adopts the attention mechanism to obtain the estimation of an individual depression severity. In this way, we use the RI and MP modules to construct the Multimodal Local Global Interaction (MLGI) network. The experimental performance on AVEC 2013 and AVEC 2014 depression datasets demonstrates the effectiveness of our method.

Original languageEnglish
Pages (from-to)2649-2664
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume36
Issue number2
DOIs
StatePublished - 2026
Externally publishedYes

Keywords

  • Multimodal depression severity estimation
  • interaction
  • local and global representations

Fingerprint

Dive into the research topics of 'Multimodal Local Global Interaction Networks for Automatic Depression Severity Estimation'. Together they form a unique fingerprint.

Cite this