Skip to main navigation Skip to search Skip to main content

Towards Robust Visual Question Answering via Prompt-Driven Geometric Harmonization

  • Yishu Liu
  • , Jiawei Zhu
  • , Congcong Wen
  • , Guangming Lu*
  • , Hui Lin
  • , Bingzhi Chen*
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Beijing Institute of Technology
  • China Academic of Electronics and Information Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

Visual Question Answering (VQA) has garnered significant attention as a crucial link between vision and language, aimed at generating accurate responses to visual queries. However, current VQA models still struggle with the challenges of minority class collapse and spurious semantic correlations posed by language bias and imbalanced distributions. To address these challenges, this paper proposes a novel Prompt-Driven Geometric Harmonization (PDGH) paradigm, which integrates both geometric structure and information entropy principles to enhance the ability of VQA models to generalize effectively across diverse scenarios. Specifically, our PDGH approach is meticulously designed to generate image-generated prompts that are guided by specific question cues, facilitating a more accurate and context-aware understanding of the visual content. Moreover, we project the prompt-visual-question and visual-question joint representations into a unified hypersphere space, applying feature weight self-orthogonality and prompt-information entropy correction constraints to optimize the margin, further alleviating minority class collapse and correcting language bias. To maintain the geometric integrity of the representation space, we introduce multi-space geometric contrast constraints to minimize the impact of spurious priors introduced during training. Finally, a semantic matrix is constructed for the coordinated joint representation to ensure that the learned instances are semantically consistent and improve reasoning ability. Extensive experiments on various general and medical VQA datasets demonstrate the consistent superiority of our PDGH approach over existing state-of-the-art baselines.

Original languageEnglish
Pages (from-to)5721-5729
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number6
DOIs
StatePublished - 11 Apr 2025
Externally publishedYes
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

Fingerprint

Dive into the research topics of 'Towards Robust Visual Question Answering via Prompt-Driven Geometric Harmonization'. Together they form a unique fingerprint.

Cite this