Skip to main navigation Skip to search Skip to main content

KG-CMI: Knowledge Graph Enhanced Cross-Mamba Interaction for Medical Visual Question Answering

  • Xianyao Zheng
  • , Hong Yu
  • , Hui Cui
  • , Changming Sun
  • , Xiangyu Li
  • , Ran Su
  • , Leyi Wei
  • , Jia Zhou
  • , Junbo Wang
  • , Qiangguo Jin*
  • *Corresponding author for this work
  • Northwestern Polytechnical University Xian
  • Tianjin Central Hospital of Gynecology Obstetrics
  • La Trobe University
  • CSIRO
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Tianjin University
  • Macao Polytechnic University
  • Tianjin Chest Hospital

Research output: Contribution to journalArticlepeer-review

Abstract

Medical visual question answering (Med-VQA) is a crucial multimodal task in clinical decision support and telemedicine. Recent methods fail to fully leverage domain-specific medical knowledge, making it difficult to accurately associate lesion features in medical images with key diagnostic criteria. In addition, classification-based approaches typically rely on predefined answer sets. Treating Med-VQA as a simple classification problem limits its ability to adapt to the diversity of free-form answers and may overlook detailed semantic information in those answers. To address these challenges, we propose a knowledge graph enhanced cross-Mamba interaction (KG-CMI) framework, which consists of a fine-grained cross-modal feature alignment module, a knowledge graph embedding module, a cross-modal interaction representation module, and a free-form answer enhanced multitask learning (FAMT) module. The KG-CMI learns cross-modal feature representations for images and texts by effectively integrating professional medical knowledge through a graph, establishing associations between lesion features and disease knowledge. Moreover, FAMT leverages auxiliary knowledge from open-ended questions, improving the model's capability for open-ended Med-VQA. Experimental results demonstrate that KG-CMI outperforms existing state-of-the-art methods on three Med-VQA datasets, i.e., VQA-RAD, SLAKE, and OVQA. In addition, we conduct interpretability experiments to further validate the framework's effectiveness.

Original languageEnglish
JournalIEEE Transactions on Industrial Informatics
DOIs
StateAccepted/In press - 2026
Externally publishedYes

Keywords

  • Cross-Mamba interaction
  • knowledge graph
  • medical visual question answering (Med-VQA)
  • multitask learning

Fingerprint

Dive into the research topics of 'KG-CMI: Knowledge Graph Enhanced Cross-Mamba Interaction for Medical Visual Question Answering'. Together they form a unique fingerprint.

Cite this