Skip to main navigation Skip to search Skip to main content

Effective Video Summarization by Extracting Parameter-Free Motion Attention

  • Tingting Han
  • , Quan Zhou
  • , Jun Yu*
  • , Zhou Yu
  • , Jianhui Zhang
  • , Sicheng Zhao
  • *Corresponding author for this work
  • Hangzhou Dianzi University
  • Tsinghua University

Research output: Contribution to journalArticlepeer-review

Abstract

Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information that cannot be captured by frame-level video representations. In this article, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially contained in adjacent video frames, using a multi-head attention architecture. The PMAM requires no additional training for model parameters, leading to an efficient and effective understanding of video dynamics. Moreover, we introduce the Multi-feature Motion Attention Network (MMAN), integrating the PMAM with local and global multi-head attention based on object-centric and scene-centric video representations. The synergistic combination of local motion information, extracted by the proposed PMAM, with long-range interactions modeled by the local and global multi-head attention mechanism, can significantly enhance the performance of video summarization. Extensive experimental results on the benchmark datasets, SumMe and TVSum, demonstrate that the proposed MMAN outperforms other state-of-the-art methods, resulting in remarkable performance gains.

Original languageEnglish
Article number219
JournalACM Transactions on Multimedia Computing, Communications and Applications
Volume20
Issue number7
DOIs
StatePublished - 16 May 2024
Externally publishedYes

Keywords

  • Video summarization
  • feature fusion
  • motion attention
  • multi-head attention
  • parameter-free

Fingerprint

Dive into the research topics of 'Effective Video Summarization by Extracting Parameter-Free Motion Attention'. Together they form a unique fingerprint.

Cite this