Abstract
Multi-modal relation extraction (MRE) requires the integration of multi-modal information to identify relationships between entities. Although fine-grained correlations between visual objects and textual words have the potential to improve cross-modal interaction, they are typically modeled implicitly and hindered by the modality gap. This paper introduces a novel method called relational Graph-Bridged cross-modal InTeraction (GBIT). GBIT aims to model fine-grained cross-modal correlations into the interaction process explicitly. This is achieved by constructing a fine-grained cross-modal relational graph, which acts as a bridge for effective cross-modal interaction in multiple layers. Within GBIT, a gated interaction strategy and an adaptive integration module are proposed for irrelevance-filtered information exchange and final information collation. Through extensive experiments on the benchmark MRE, we demonstrate the superiority of our proposed method for MRE.
| Original language | English |
|---|---|
| Pages (from-to) | 12647-12651 |
| Number of pages | 5 |
| Journal | Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing |
| DOIs | |
| State | Published - 2024 |
| Event | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 |
Keywords
- Multimedia
- Relation Extraction
Fingerprint
Dive into the research topics of 'Relational Graph-Bridged Image-Text Interaction: A Novel Method for Multi-Modal Relation Extraction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver