Skip to main navigation Skip to search Skip to main content

Semantic Collaborative Learning for Cross-Modal Moment Localization

  • Yupeng Hu
  • , Kun Wang
  • , Meng Liu
  • , Haoyu Tang*
  • , Liqiang Nie*
  • *Corresponding author for this work
  • Shandong University
  • Shandong Jianzhu University
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Localizing a desired moment within an untrimmed video via a given natural language query, i.e., cross-modal moment localization, has attracted widespread research attention recently. However, it is a challenging task because it requires not only accurately understanding intra-modal semantic information, but also explicitly capturing inter-modal semantic correlations (consistency and complementarity). Existing efforts mainly focus on intra-modal semantic understanding and inter-modal semantic alignment, while ignoring necessary semantic supplement. Consequently, we present a cross-modal semantic perception network for more effective intra-modal semantic understanding and inter-modal semantic collaboration. Concretely, we design a dual-path representation network for intra-modal semantic modeling. Meanwhile, we develop a semantic collaborative network to achieve multi-granularity semantic alignment and hierarchical semantic supplement. Thereby, effective moment localization can be achieved based on sufficient semantic collaborative learning. Extensive comparison experiments demonstrate the promising performance of our model compared with existing state-of-the-art competitors.

Original languageEnglish
Article number50
JournalACM Transactions on Information Systems
Volume42
Issue number2
DOIs
StatePublished - 7 Nov 2023
Externally publishedYes

Keywords

  • Cross-modal moment localization
  • inter-modal semantic collaboration
  • intra-modal semantic understanding

Fingerprint

Dive into the research topics of 'Semantic Collaborative Learning for Cross-Modal Moment Localization'. Together they form a unique fingerprint.

Cite this