Skip to main navigation Skip to search Skip to main content

DiCLIP: Integrating DINOv2 and CLIP for Zero-Shot and Few-Shot Anomaly Detection with Versatile Combination Prompts

  • Xinxu Cai
  • , Lihang Sun
  • , Zhenshen Qu*
  • , Jiazheng Xu
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Xi'an Jiaotong University

Research output: Contribution to journalConference articlepeer-review

Abstract

Industrial Anomaly Detection (AD) encounters a significant cold-start challenge due to the requirement of a large number of labeled normal samples, which are often difficult to obtain in new production lines. Although Zero-Shot Anomaly Detection (ZSAD) and Few-Shot Anomaly Detection (FSAD) have been proposed as potential solutions, existing methods suffer from limitations in generalization ability and are prone to contamination by anomalies in few-shot scenarios. To address these issues, we propose DiCLIP, a unified framework that integrates ZSAD and FSAD through three key innovations. First, Versatile Combination Prompts Learning combines static, dynamic, and anomaly-sensitive prompts to leverage textual anomaly cues together with image features for accurate anomaly localization in images. Second, the Anomaly-Aware Memory Bank utilizes ZSAD priors to filter contaminated features, enabling anomaly detection based on a small number of anomaly samples. Third, Adaptive Threshold Optimization integrates semantic alignment from ZSAD with feature matching from FSAD to release the constraint of a uniform threshold for test images, thereby achieving higher-precision segmentation and localization performance. Extensive experiments on the standard MVTec and VisA benchmark datasets demonstrate the superior performance of DiCLIP, highlighting its effectiveness and practical value for industrial deployment.

Keywords

  • Anomaly Detection
  • Few-shot Learning
  • Prompt Learning
  • Zero-shot Learning

Fingerprint

Dive into the research topics of 'DiCLIP: Integrating DINOv2 and CLIP for Zero-Shot and Few-Shot Anomaly Detection with Versatile Combination Prompts'. Together they form a unique fingerprint.

Cite this