Skip to main navigation Skip to search Skip to main content

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

  • Baoshuo Kan
  • , Teng Wang
  • , Wenpeng Lu*
  • , Xiantong Zhen
  • , Weili Guan
  • , Feng Zheng*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable prompts achieve state-of-the-art performance, which however are prone to overfit to seen classes, failing to generalize to unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models. Our approach takes the inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects. Specifically, we design two complementary types of knowledge-aware prompts for the text encoder to leverage the distinctive characteristics of category-related external knowledge. The discrete prompt extracts the key information from descriptions of an object category, and the learned continuous prompt captures overall contexts. We further design an adaptation head for the visual encoder to aggregate salient attentive visual cues, which establishes discriminative and task-aware visual representations. We conduct extensive experiments on 11 widely-used benchmark datasets and the results verify the effectiveness in few-shot image classification, especially in generalizing to unseen categories. Compared with the state-of-the-art CoCoOp method, KAPT exhibits favorable performance and achieves an absolute gain of 3.22% on new classes and 2.57% in terms of harmonic mean.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages15624-15634
Number of pages11
ISBN (Electronic)9798350307184
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, France
Duration: 2 Oct 20236 Oct 2023

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499

Conference

Conference2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Country/TerritoryFrance
CityParis
Period2/10/236/10/23

Fingerprint

Dive into the research topics of 'Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models'. Together they form a unique fingerprint.

Cite this