TY - GEN
T1 - Boosting Discriminability for Robust Multimodal Entity Linking with Visual Modality Missing
AU - Lao, Mingrui
AU - Li, Zheng
AU - Guo, Yanming
AU - Zhang, Xueyi
AU - Cai, Siqi
AU - Ding, Zhaoyun
AU - Li, Haizhou
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/7/13
Y1 - 2025/7/13
N2 - Multimodal Entity Linking (MEL) aims to retrieve ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, typically based on the assumption of modality completeness. However, when deployed in open-world applications, MEL systems may encounter uncertainly missing of visual modalities from user-proposed mentions. In this paper, we propose a novel setting dubbed MEL-MM to simulate the practical challenge, and reveal that the semantic discriminability is a crucial factor to enhance the anti-missingness resilience. To this end, we introduce an innovative yet efficient approach termed Cross-View Introspective Ranking Distillation (CVIRD), which seeks to sufficiently align the linking similarities between teacher and student models trained from modality-complete and incomplete data. To be specific, as the first concept in CVIRD, Missing-Aware Ranking Distillation (MARD) focuses on modeling the discriminability by formulating the similarity rankings between mention and entities in a missing-sensitive and differentiable manner. Moreover, the second concept of Cross-View Distillation with Introspection (CVDI) aims to improve discriminability extraction in MARD through multi-level distillation, considering both cross-view retrieval and self-consistency. Experiments verify the effectiveness and model-agnostic ability of our method, which achieves superior performance in contrast to competitive missingness-resilient strategies.
AB - Multimodal Entity Linking (MEL) aims to retrieve ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, typically based on the assumption of modality completeness. However, when deployed in open-world applications, MEL systems may encounter uncertainly missing of visual modalities from user-proposed mentions. In this paper, we propose a novel setting dubbed MEL-MM to simulate the practical challenge, and reveal that the semantic discriminability is a crucial factor to enhance the anti-missingness resilience. To this end, we introduce an innovative yet efficient approach termed Cross-View Introspective Ranking Distillation (CVIRD), which seeks to sufficiently align the linking similarities between teacher and student models trained from modality-complete and incomplete data. To be specific, as the first concept in CVIRD, Missing-Aware Ranking Distillation (MARD) focuses on modeling the discriminability by formulating the similarity rankings between mention and entities in a missing-sensitive and differentiable manner. Moreover, the second concept of Cross-View Distillation with Introspection (CVDI) aims to improve discriminability extraction in MARD through multi-level distillation, considering both cross-view retrieval and self-consistency. Experiments verify the effectiveness and model-agnostic ability of our method, which achieves superior performance in contrast to competitive missingness-resilient strategies.
KW - Information Retrieval
KW - Multimodal Entity Linking
KW - Multimodal Learning with Modality Missing
UR - https://www.scopus.com/pages/publications/105011817439
U2 - 10.1145/3726302.3729906
DO - 10.1145/3726302.3729906
M3 - 会议稿件
AN - SCOPUS:105011817439
T3 - SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 989
EP - 999
BT - SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025
Y2 - 13 July 2025 through 18 July 2025
ER -