Skip to main navigation Skip to search Skip to main content

Boosting Discriminability for Robust Multimodal Entity Linking with Visual Modality Missing

  • Mingrui Lao
  • , Zheng Li
  • , Yanming Guo*
  • , Xueyi Zhang*
  • , Siqi Cai
  • , Zhaoyun Ding
  • , Haizhou Li
  • *Corresponding author for this work
  • National University of Defense Technology
  • National University of Singapore
  • National Key Laboratory of Information Systems Engineering
  • The Chinese University of Hong Kong, Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multimodal Entity Linking (MEL) aims to retrieve ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, typically based on the assumption of modality completeness. However, when deployed in open-world applications, MEL systems may encounter uncertainly missing of visual modalities from user-proposed mentions. In this paper, we propose a novel setting dubbed MEL-MM to simulate the practical challenge, and reveal that the semantic discriminability is a crucial factor to enhance the anti-missingness resilience. To this end, we introduce an innovative yet efficient approach termed Cross-View Introspective Ranking Distillation (CVIRD), which seeks to sufficiently align the linking similarities between teacher and student models trained from modality-complete and incomplete data. To be specific, as the first concept in CVIRD, Missing-Aware Ranking Distillation (MARD) focuses on modeling the discriminability by formulating the similarity rankings between mention and entities in a missing-sensitive and differentiable manner. Moreover, the second concept of Cross-View Distillation with Introspection (CVDI) aims to improve discriminability extraction in MARD through multi-level distillation, considering both cross-view retrieval and self-consistency. Experiments verify the effectiveness and model-agnostic ability of our method, which achieves superior performance in contrast to competitive missingness-resilient strategies.

Original languageEnglish
Title of host publicationSIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages989-999
Number of pages11
ISBN (Electronic)9798400715921
DOIs
StatePublished - 13 Jul 2025
Externally publishedYes
Event48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025 - Padua, Italy
Duration: 13 Jul 202518 Jul 2025

Publication series

NameSIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025
Country/TerritoryItaly
CityPadua
Period13/07/2518/07/25

Keywords

  • Information Retrieval
  • Multimodal Entity Linking
  • Multimodal Learning with Modality Missing

Fingerprint

Dive into the research topics of 'Boosting Discriminability for Robust Multimodal Entity Linking with Visual Modality Missing'. Together they form a unique fingerprint.

Cite this