Skip to main navigation Skip to search Skip to main content

Multimodal Evidential Learning for Open-World Weakly-Supervised Video Anomaly Detection

  • Chao Huang
  • , Weiliang Huang
  • , Qiuping Jiang*
  • , Wei Wang
  • , Jie Wen
  • , Bob Zhang
  • *Corresponding author for this work
  • Sun Yat-Sen University
  • University of Macau
  • Ningbo University
  • Harbin Institute of Technology Shenzhen

Research output: Contribution to journalArticlepeer-review

Abstract

Efforts in weakly-supervised video anomaly detection center on detecting abnormal events within videos by coarse-grained labels, which has been successfully applied to many real-world applications. However, a significant limitation of most existing methods is that they are only effective for specific objects in specific scenarios, which makes them prone to misclassification or omission when confronted with previously unseen anomalies. Relative to conventional anomaly detection tasks, Open-world Weakly-supervised Video Anomaly Detection (OWVAD) poses greater challenges due to the absence of labels and fine-grained annotations for unknown anomalies. To address the above problem, we propose a multi-scale evidential vision-language model to achieve open-world video anomaly detection. Specifically, we leverage generalized visual-language associations derived from CLIP to harness the full potential of large pre-trained models in addressing the OWVAD task. Subsequently, we integrate a multi-scale temporal modeling module with a multimodal evidence collector to achieve precise frame-level detection of both seen and unseen anomalies. Extensive experiments on two widely-utilized benchmarks have conclusively validated the effectiveness of our method. The code will be made publicly available.

Original languageEnglish
Pages (from-to)3132-3143
Number of pages12
JournalIEEE Transactions on Multimedia
Volume27
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Video anomaly detection
  • evidential learning
  • vision-language model

Fingerprint

Dive into the research topics of 'Multimodal Evidential Learning for Open-World Weakly-Supervised Video Anomaly Detection'. Together they form a unique fingerprint.

Cite this