Skip to main navigation Skip to search Skip to main content

Self-Relevance-Based Multimodal In-Context Learning for Multimodal Named Entity Recognition

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Qilu University of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, Multimodal Named Entity Recognition (MNER) has attracted significant attention. Although MNER utilizing in-context learning has shown improved performance, modality retrieval bias often diminishes the relevance of in-context examples. To address this issue, we propose a self-relevance-based multimodal in-context learning method to mitigate modality retrieval bias by dynamically adjusting the weight of each modality. Specifically, we first measure the self-relevance of the query by calculating the similarity between textual and visual modalities, which helps to assess how much visual information contributes to the textual context. Then, we rank the similarity of different modalities, adjust the image rankings based on self-relevance to reduce modality retrieval bias, and integrate them to select the k most relevant examples. Finally, we use task definition and retrieved examples as effective guidance provided to the Multimodal Large Language Models to obtain feedback. Experimental results demonstrate that our method achieves SOTA performance on two benchmark datasets.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Multimedia and Expo
Subtitle of host publicationJourney to the Center of Machine Imagination, ICME 2025 - Conference Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9798331594954
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Multimedia and Expo, ICME 2025 - Nantes, France
Duration: 30 Jun 20254 Jul 2025

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Country/TerritoryFrance
CityNantes
Period30/06/254/07/25

Keywords

  • Multimodal information extraction
  • data mining
  • in-context learning
  • multimodal large language model

Fingerprint

Dive into the research topics of 'Self-Relevance-Based Multimodal In-Context Learning for Multimodal Named Entity Recognition'. Together they form a unique fingerprint.

Cite this