Skip to main navigation Skip to search Skip to main content

IMAGINARYNET: LEARNING OBJECT DETECTORS WITHOUT REAL IMAGES AND ANNOTATIONS

  • Minheng Ni
  • , Zitong Huang
  • , Kailai Feng
  • , Wangmeng Zuo*
  • *Corresponding author for this work
  • Faculty of Computing, Harbin Institute of Technology

Research output: Contribution to conferencePaperpeer-review

Abstract

Without the demand of training in reality, humans are able of detecting a new category of object simply based on the language description on its visual characteristics. Empowering deep learning with this ability undoubtedly enables the neural network to handle complex vision tasks, e.g., object detection, without collecting and annotating real images. To this end, this paper introduces a novel challenging learning paradigm Imaginary-Supervised Object Detection (ISOD), where neither real images nor manual annotations are allowed for training object detectors. To resolve this challenge, we propose IMAGINARYNET, a framework to synthesize images by combining pretrained language model and text-to-image synthesis model. Given a class label, the language model is used to generate a full description of a scene with a target object, and the text-to-image model is deployed to generate a photo-realistic image. With the synthesized images and class labels, weakly supervised object detection can then be leveraged to accomplish ISOD. By gradually introducing real images and manual annotations, IMAGINARYNET can collaborate with other supervision settings to further boost detection performance. Experiments show that IMAGINARYNET can (i) obtain about 75% performance in ISOD compared with the weakly supervised counterpart of the same backbone trained on real data, (ii) significantly improve the baseline while achieving state-of-the-art or comparable performance by incorporating IMAGINARYNET with other supervision settings. Our code will be publicly available at https://github.com/kodenii/ImaginaryNet.

Original languageEnglish
StatePublished - 2023
Externally publishedYes
Event11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda
Duration: 1 May 20235 May 2023

Conference

Conference11th International Conference on Learning Representations, ICLR 2023
Country/TerritoryRwanda
CityKigali
Period1/05/235/05/23

Fingerprint

Dive into the research topics of 'IMAGINARYNET: LEARNING OBJECT DETECTORS WITHOUT REAL IMAGES AND ANNOTATIONS'. Together they form a unique fingerprint.

Cite this