Skip to main navigation Skip to search Skip to main content

A Survey on Multimodal Recommender Systems: Recent Advances and Future Directions

  • Jinfeng Xu
  • , Zheyu Chen
  • , Shuo Yang
  • , Jinze Li
  • , Wei Wang
  • , Xiping Hu
  • , Steven Hoi
  • , Edith Ngai*
  • *Corresponding author for this work
  • The University of Hong Kong
  • Hong Kong Polytechnic University
  • Shenzhen MSU-BIT University
  • Beijing Institute of Technology
  • Singapore Management University

Research output: Contribution to journalReview articlepeer-review

Abstract

The exponential growth of online information has made it increasingly difficult for users to identify valuable and relevant content. Recommender systems have emerged as a critical solution to this challenge by tailoring content to individual preferences. With the proliferation of diverse multimedia services, human interaction with the digital world has become inherently multimodal. Consequently, recommender systems capable of comprehending and interpreting multimodal information can more effectively align with individual preferences. With its recent surge in research attention, the field of multimodal recommender systems (MRS) still lacks a comprehensive technical survey. Existing surveys suffer from two critical limitations: 1) Insufficient technical depth: Prior works predominantly focus on categorizing and discussing general structure, neglecting rigorous technical analysis of methodologies and architectures. 2) Absence of cutting-edge works: Due to the rapid evolution of AI technologies, current surveys fail to discuss the latest works that adopt the most advanced techniques. To bridge these gaps, this survey conducts a systematic and technical review of advanced MRS works from MRS's inception to the present. We organize existing works into coherent taxonomies based on their structure and provide in-depth analyses of methodological innovations at each component, including Feature Extraction, Encoder, Multimodal Fusion, and Loss Function. Moreover, we further discuss potential future directions for developing and enhancing MRS. This survey serves as technical guidance for researchers and practitioners, offering insights into the developments, techniques, and future directions of MRS. Notably.

Original languageEnglish
JournalIEEE Transactions on Multimedia
DOIs
StateAccepted/In press - 2026
Externally publishedYes

Keywords

  • Data mining
  • Information systems
  • Multimedia information systems
  • Multimodal recommender systems

Fingerprint

Dive into the research topics of 'A Survey on Multimodal Recommender Systems: Recent Advances and Future Directions'. Together they form a unique fingerprint.

Cite this