Skip to main navigation Skip to search Skip to main content

C2F: An effective coarse-to-fine network for video summarization

  • Faculty of Computing, Harbin Institute of Technology
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The objective of video summarization is to develop a concise and condensed summary that accurately captures the original video content. The methods currently used to summarize supervised videos and consider the task a sequence-to-sequence problem. However, modeling the order of long videos presents three challenges: (1) capturing both local and global relationships simultaneously is challenging; (2) the boundaries of video highlight segments are often incorrectly located, indicating that semantic integrity is incomplete; (3) efficient relation computing is difficult to do well. We design a novel coarse-to-fine network (C2F) for video summarization adapted to the multi-level semantic video structure, thus addressing these limitations. The multiscale representation scheme initially captures different scales of temporal relationships for the coarse classification results; Meanwhile, the action-wise proposal module is intended to provide the fine prediction of importance scores and regress the temporal locations of key-frames. In addition, a loss function is proposed to identify local differences among frames and analyze combinations of various loss functions. Extensive experimental results on two benchmark datasets have demonstrated that the proposed C2F achieves significant performance compared with state-of-the-art methods, and performs satisfactorily in efficient relation computing. For example, on the TVSum dataset, we improve the F-score from 69.4% to 72.8% by 3.4%. Furthermore, C2F includes 4.7 M parameters, accounting for only 10.7% of the parameters used in the SASUM model.

Original languageEnglish
Article number104962
JournalImage and Vision Computing
Volume144
DOIs
StatePublished - Apr 2024
Externally publishedYes

Keywords

  • Coarse-to-fine network
  • Local adaptive loss
  • Multiscale representation
  • Video summarization

Fingerprint

Dive into the research topics of 'C2F: An effective coarse-to-fine network for video summarization'. Together they form a unique fingerprint.

Cite this