Abstract
Existing SOTA methods in active learning object detection(ALOD) achieve impressive results, but they overlook two problems: 1) the requirement for instance-level labels during initialization, and 2) the constrained localization ability of the pre-trained fully supervised detector in the active learning phase. Problem 1) contradicts the fundamental purpose of active learning in balancing annotation costs and detection performance. Problem 2) arises from the fact that the active learning process relies on a single pre-trained fully supervised detector. To tackle these problems, we propose Image-level Labels Driven active learning object detection (termed as ILD). Specifically, we propose a multi-step reasoning process based on the chain-of-thought only using image-level labels, including a class-number-aware step and an iterative step, to enhance the detection ability of VLM. The detection results of the VLM and weakly supervised detector are used as pseudo ground-truth boxes to initialize a fully supervised detector during AL initialization. Thus, the initialization process of ILD eliminates the requirement for instance-level labels. In the active learning stage, we design two novel uncertainty and diversity acquisition functions to select the most informative images based on collaborative outputs from both the weakly supervised detector and the pre-trained fully supervised detector. The collaborative mechanism jointly measures the uncertainty of two detectors and the diversity of object features, thereby enhancing the localization quality. Extensive experiments demonstrate that the proposed ILD achieves state-of-the-art performance(i.e., 77.5%, 25.7%, and 27.9%) on PASCAL VOC2007, MS COCO2014 and MS COCO2017 datasets, surpassing the SOTA methods by 3.4%, 1.2% and 4.7%, respectively. Our code is publicly available on https://github.com/RuiTianHIT/ILD
| Original language | English |
|---|---|
| Pages (from-to) | 2814-2829 |
| Number of pages | 16 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 36 |
| Issue number | 3 |
| DOIs | |
| State | Published - 2026 |
| Externally published | Yes |
Keywords
- Active learning object detection
- acquisition function
- large-scale vision-language pre-trained model
- weakly supervised object detection
Fingerprint
Dive into the research topics of 'ILD: Image-Level Labels Driven Active Learning Object Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver