Skip to main navigation Skip to search Skip to main content

Multimodal-guided mixture-of-experts bias removal strategy for natural language video localization

  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The datasets in natural language video localization are relabeled from other tasks, leading to severe bias issues that hinder effective model training. Current methods primarily address distributional and modal biases in datasets but lack comprehensive solutions for the two types of annotation biases introduced during dataset labeling. To tackle this problem, we propose a multimodal-guided mixture-of-expert bias removal strategy. This method simulates diverse query statements by introducing gaussian noise, employs multiple general experts to mimic different annotation tendencies, and utilizes a shared expert to extract common features from the annotation process, thereby addressing uncertainty in target moment annotations. To better balance the contributions of multiple experts, we introduce auxiliary losses, including importance loss, load loss, and KL divergence loss. Extensive experiments on two widely used datasets, Charades-STA and ActivityNet Captions, along with implementation across four backbone networks, demonstrate the effectiveness of our approach.

Original languageEnglish
Article number61
JournalMultimedia Systems
Volume32
Issue number1
DOIs
StatePublished - Feb 2026
Externally publishedYes

Keywords

  • Auxiliary Loss
  • Mixture-of-experts
  • Multimodal-guided
  • Natural Language Video Localization

Fingerprint

Dive into the research topics of 'Multimodal-guided mixture-of-experts bias removal strategy for natural language video localization'. Together they form a unique fingerprint.

Cite this