Skip to main navigation Skip to search Skip to main content

Medical multimodal large language models: A survey

  • Hanguang Xiao
  • , Ningzhi Hui
  • , Yong Xu*
  • , Zhipeng Li
  • , Jincheng Peng
  • *Corresponding author for this work
  • Chongqing Institute of Technology
  • Chongqing Key Laboratory of Embodied Intelligence Perception and Autonomous Learning for Humanoid Robots
  • Key Laboratory of Advanced Equipment Intelligence of Chongqing Education Commission
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

In recent years, multimodal large language models (MLLMs) have gradually given rise to medical multimodal large language models (Medical MLLMs) through the integration of multimodal data such as clinical reports, medical images, physiological signals, and doctor-patient conversations. This progress has substantially improved the efficiency and quality of clinical question answering. Given the rapid development of this field and its broad clinical potential, this survey presents a systematic review of the core tasks, fundamental principles, methodological innovations, and future research directions of Medical MLLMs. Specifically, this survey first outlines the core medical tasks addressed by Medical MLLMs. It then dissects the three key modules of Medical MLLMs and, through a fine-grained vector-level analysis, explains how input feature vectors are processed and propagated across these modules. Subsequently, existing medical datasets are systematically categorized according to different stages of model training to help researchers identify relevant resources more efficiently. Furthermore, this survey elaborates on the training and evaluation methods of Medical MLLMs and discusses advanced strategies for activating their reasoning capabilities. Finally, it summarizes their practical applications in medical scenarios and highlights the critical challenges and future directions. This survey aims to provide a comprehensive reference for researchers in the field of Medical MLLMs and to promote the adaptation of Medical MLLMs to increasingly complex and diverse medical tasks.

Original languageEnglish
Article number104386
JournalInformation Fusion
Volume134
DOIs
StatePublished - Oct 2026
Externally publishedYes

Keywords

  • Chain-of-thought reasoning
  • Medical image understanding
  • Medical large language models
  • Multimodal large language models
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Medical multimodal large language models: A survey'. Together they form a unique fingerprint.

Cite this