Skip to main navigation Skip to search Skip to main content

ILD: Image-Level Labels Driven Active Learning Object Detection

  • Rui Tian
  • , Jiaxuan Zhang
  • , Yongqiang Zhang*
  • , Man Zhang
  • , Zian Zhang
  • , Yin Zhang
  • , Yongqiang Li
  • , Wangmeng Zuo
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Inner Mongolia University
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Existing SOTA methods in active learning object detection(ALOD) achieve impressive results, but they overlook two problems: 1) the requirement for instance-level labels during initialization, and 2) the constrained localization ability of the pre-trained fully supervised detector in the active learning phase. Problem 1) contradicts the fundamental purpose of active learning in balancing annotation costs and detection performance. Problem 2) arises from the fact that the active learning process relies on a single pre-trained fully supervised detector. To tackle these problems, we propose Image-level Labels Driven active learning object detection (termed as ILD). Specifically, we propose a multi-step reasoning process based on the chain-of-thought only using image-level labels, including a class-number-aware step and an iterative step, to enhance the detection ability of VLM. The detection results of the VLM and weakly supervised detector are used as pseudo ground-truth boxes to initialize a fully supervised detector during AL initialization. Thus, the initialization process of ILD eliminates the requirement for instance-level labels. In the active learning stage, we design two novel uncertainty and diversity acquisition functions to select the most informative images based on collaborative outputs from both the weakly supervised detector and the pre-trained fully supervised detector. The collaborative mechanism jointly measures the uncertainty of two detectors and the diversity of object features, thereby enhancing the localization quality. Extensive experiments demonstrate that the proposed ILD achieves state-of-the-art performance(i.e., 77.5%, 25.7%, and 27.9%) on PASCAL VOC2007, MS COCO2014 and MS COCO2017 datasets, surpassing the SOTA methods by 3.4%, 1.2% and 4.7%, respectively. Our code is publicly available on https://github.com/RuiTianHIT/ILD

Original languageEnglish
Pages (from-to)2814-2829
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume36
Issue number3
DOIs
StatePublished - 2026
Externally publishedYes

Keywords

  • Active learning object detection
  • acquisition function
  • large-scale vision-language pre-trained model
  • weakly supervised object detection

Fingerprint

Dive into the research topics of 'ILD: Image-Level Labels Driven Active Learning Object Detection'. Together they form a unique fingerprint.

Cite this