Abstract
In recent years, multimodal large language models (MLLMs) have gradually given rise to medical multimodal large language models (Medical MLLMs) through the integration of multimodal data such as clinical reports, medical images, physiological signals, and doctor-patient conversations. This progress has substantially improved the efficiency and quality of clinical question answering. Given the rapid development of this field and its broad clinical potential, this survey presents a systematic review of the core tasks, fundamental principles, methodological innovations, and future research directions of Medical MLLMs. Specifically, this survey first outlines the core medical tasks addressed by Medical MLLMs. It then dissects the three key modules of Medical MLLMs and, through a fine-grained vector-level analysis, explains how input feature vectors are processed and propagated across these modules. Subsequently, existing medical datasets are systematically categorized according to different stages of model training to help researchers identify relevant resources more efficiently. Furthermore, this survey elaborates on the training and evaluation methods of Medical MLLMs and discusses advanced strategies for activating their reasoning capabilities. Finally, it summarizes their practical applications in medical scenarios and highlights the critical challenges and future directions. This survey aims to provide a comprehensive reference for researchers in the field of Medical MLLMs and to promote the adaptation of Medical MLLMs to increasingly complex and diverse medical tasks.
| Original language | English |
|---|---|
| Article number | 104386 |
| Journal | Information Fusion |
| Volume | 134 |
| DOIs | |
| State | Published - Oct 2026 |
| Externally published | Yes |
Keywords
- Chain-of-thought reasoning
- Medical image understanding
- Medical large language models
- Multimodal large language models
- Reinforcement learning
Fingerprint
Dive into the research topics of 'Medical multimodal large language models: A survey'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver