Skip to main navigation Skip to search Skip to main content

Spiking Variational Graph Representation Inference for Video Summarization

  • Wenrui Li
  • , Wei Han
  • , Liang Jian Deng
  • , Ruiqin Xiong
  • , Xiaopeng Fan*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • University of Electronic Science and Technology of China
  • Peking University

Research output: Contribution to journalArticlepeer-review

Abstract

With the rise of short video content, efficient video summarization techniques for extracting key information have become crucial. However, existing methods struggle to capture the global temporal dependencies and maintain the semantic coherence of video content. Additionally, these methods are also influenced by noise during multi-channel feature fusion. We propose a Spiking Variational Graph (SpiVG) Network, which enhances information density and reduces computational complexity. First, we design a keyframe extractor based on Spiking Neural Networks (SNN), leveraging the event-driven computation mechanism of SNNs to learn keyframe features autonomously. To enable fine-grained and adaptable reasoning across video frames, we introduce a Dynamic Aggregation Graph Reasoner, which decouples contextual object consistency from semantic perspective coherence. We present a Variational Inference Reconstruction Module to address uncertainty and noise arising during multi-channel feature fusion. In this module, we employ Evidence Lower Bound Optimization (ELBO) to capture the latent structure of multi-channel feature distributions, using posterior distribution regularization to reduce overfitting. Experimental results show that SpiVG surpasses existing methods across multiple datasets such as SumMe, TVSum, VideoXum, and QFVS.

Original languageEnglish
Pages (from-to)5697-5709
Number of pages13
JournalIEEE Transactions on Image Processing
Volume34
DOIs
StatePublished - 2025

Keywords

  • Graph representation learning
  • spiking neural network
  • video summarization

Fingerprint

Dive into the research topics of 'Spiking Variational Graph Representation Inference for Video Summarization'. Together they form a unique fingerprint.

Cite this