Abstract
The exponential growth of online information has made it increasingly difficult for users to identify valuable and relevant content. Recommender systems have emerged as a critical solution to this challenge by tailoring content to individual preferences. With the proliferation of diverse multimedia services, human interaction with the digital world has become inherently multimodal. Consequently, recommender systems capable of comprehending and interpreting multimodal information can more effectively align with individual preferences. With its recent surge in research attention, the field of multimodal recommender systems (MRS) still lacks a comprehensive technical survey. Existing surveys suffer from two critical limitations: 1) Insufficient technical depth: Prior works predominantly focus on categorizing and discussing general structure, neglecting rigorous technical analysis of methodologies and architectures. 2) Absence of cutting-edge works: Due to the rapid evolution of AI technologies, current surveys fail to discuss the latest works that adopt the most advanced techniques. To bridge these gaps, this survey conducts a systematic and technical review of advanced MRS works from MRS's inception to the present. We organize existing works into coherent taxonomies based on their structure and provide in-depth analyses of methodological innovations at each component, including Feature Extraction, Encoder, Multimodal Fusion, and Loss Function. Moreover, we further discuss potential future directions for developing and enhancing MRS. This survey serves as technical guidance for researchers and practitioners, offering insights into the developments, techniques, and future directions of MRS. Notably.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Multimedia |
| DOIs | |
| State | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Data mining
- Information systems
- Multimedia information systems
- Multimodal recommender systems
Fingerprint
Dive into the research topics of 'A Survey on Multimodal Recommender Systems: Recent Advances and Future Directions'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver