Relational Graph-Bridged Image-Text Interaction: A Novel Method for Multi-Modal Relation Extraction

  • Zihao Zheng
  • , Tao He
  • , Ming Liu*
  • , Zhongyuan Wang
  • , Ruiji Fu
  • , Bing Qin
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Multi-modal relation extraction (MRE) requires the integration of multi-modal information to identify relationships between entities. Although fine-grained correlations between visual objects and textual words have the potential to improve cross-modal interaction, they are typically modeled implicitly and hindered by the modality gap. This paper introduces a novel method called relational Graph-Bridged cross-modal InTeraction (GBIT). GBIT aims to model fine-grained cross-modal correlations into the interaction process explicitly. This is achieved by constructing a fine-grained cross-modal relational graph, which acts as a bridge for effective cross-modal interaction in multiple layers. Within GBIT, a gated interaction strategy and an adaptive integration module are proposed for irrelevance-filtered information exchange and final information collation. Through extensive experiments on the benchmark MRE, we demonstrate the superiority of our proposed method for MRE.

Original languageEnglish
Pages (from-to)12647-12651
Number of pages5
JournalProceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Keywords

  • Multimedia
  • Relation Extraction

Fingerprint

Dive into the research topics of 'Relational Graph-Bridged Image-Text Interaction: A Novel Method for Multi-Modal Relation Extraction'. Together they form a unique fingerprint.

Cite this