Skip to main navigation Skip to search Skip to main content

MedFILIP: Medical Fine-Grained Language-Image Pre-Training

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Harbin Medical University
  • Northeast Forestry University
  • King Abdullah University of Science and Technology
  • Case Western Reserve University

Research output: Contribution to journalArticlepeer-review

Abstract

Medical vision-language pretraining (VLP) that leverages naturally-paired medical image-report data is crucial for medical image analysis. However, existing methods struggle to accurately characterize associations between images and diseases, leading to inaccurate or incomplete diagnostic results. In this work, we propose MedFILIP, a fine-grained VLP model, introduces medical image-specific knowledge through contrastive learning, specifically: 1) An information extractor based on a large language model is proposed to decouple comprehensive disease details from reports, which excels in extracting disease deals through flexible prompt engineering, thereby effectively reducing text complexity while retaining rich information at a tiny cost. 2) A knowledge injector is proposed to construct relationships between categories and visual attributes, which help the model to make judgments based on image features, and fosters knowledge extrapolation to unfamiliar disease categories. 3) A semantic similarity matrix based on fine-grained annotations is proposed, providing smoother, information-richer labels, thus allowing fine-grained image-text alignment. 4) We validate MedFILIP on numerous datasets, e.g., RSNA-Pneumonia, NIH ChestX-ray14, VinBigData, and COVID-19. For single-label, multi-label, and fine-grained classification, our model achieves state-of-the-art performance, the classification accuracy has increased by a maximum of 6.69%.

Original languageEnglish
Pages (from-to)3587-3597
Number of pages11
JournalIEEE Journal of Biomedical and Health Informatics
Volume29
Issue number5
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • CXR imaging
  • Contrastive learning
  • fine-grained
  • interpretability
  • vision-language pretraining

Fingerprint

Dive into the research topics of 'MedFILIP: Medical Fine-Grained Language-Image Pre-Training'. Together they form a unique fingerprint.

Cite this