Abstract
Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information that cannot be captured by frame-level video representations. In this article, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially contained in adjacent video frames, using a multi-head attention architecture. The PMAM requires no additional training for model parameters, leading to an efficient and effective understanding of video dynamics. Moreover, we introduce the Multi-feature Motion Attention Network (MMAN), integrating the PMAM with local and global multi-head attention based on object-centric and scene-centric video representations. The synergistic combination of local motion information, extracted by the proposed PMAM, with long-range interactions modeled by the local and global multi-head attention mechanism, can significantly enhance the performance of video summarization. Extensive experimental results on the benchmark datasets, SumMe and TVSum, demonstrate that the proposed MMAN outperforms other state-of-the-art methods, resulting in remarkable performance gains.
| Original language | English |
|---|---|
| Article number | 219 |
| Journal | ACM Transactions on Multimedia Computing, Communications and Applications |
| Volume | 20 |
| Issue number | 7 |
| DOIs | |
| State | Published - 16 May 2024 |
| Externally published | Yes |
Keywords
- Video summarization
- feature fusion
- motion attention
- multi-head attention
- parameter-free
Fingerprint
Dive into the research topics of 'Effective Video Summarization by Extracting Parameter-Free Motion Attention'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver