Skip to main navigation Skip to search Skip to main content

Adversarial temporal sentence grounding by learning from external data

  • Tingting Han*
  • , Kai Wang
  • , Jun Yu
  • , Sicheng Zhao
  • , Jianping Fan
  • *Corresponding author for this work
  • Hangzhou Dianzi University
  • Tsinghua University

Research output: Contribution to journalArticlepeer-review

Abstract

Temporal sentence grounding (TSG) aims to localize the temporal moment that semantically corresponds to a given natural language query in the untrimmed video. Great efforts have been made to solve the problem in both fully supervised and weakly supervised settings. However, fully supervised methods heavily rely on manually annotated start and end timestamps which are arduous to obtain, while weakly supervised methods suffer from performance issues due to the lack of supervision. In this paper, we propose to solve the temporal sentence grounding by exploring external data. Specifically, we design an Adversarial Temporal Sentence Grounding (ATSG) framework, comprising a proposal generator and a semantic discriminator which is firstly pre-trained on external data. Benefiting from the pre-training, the semantic discriminator possesses the ability to distinguish cross-modal semantic similarities and encourages the proposal generator to produce more accurate candidates. In addition, we use an adversarial training process in the joint optimization stage where the proposal generator and the semantic discriminator compete alternately, ultimately leading to improved TSG performance. We conduct extensive experiments on two public benchmarks, i.e., ActivityNet Captions and Charades-STA, and the results demonstrate that the proposed ATSG network achieves state-of-the-art performance.

Original languageEnglish
Article number111621
JournalPattern Recognition
Volume165
DOIs
StatePublished - Sep 2025
Externally publishedYes

Keywords

  • Adversarial training
  • Cross-modal alignment
  • External data
  • Temporal sentence grounding

Fingerprint

Dive into the research topics of 'Adversarial temporal sentence grounding by learning from external data'. Together they form a unique fingerprint.

Cite this