Skip to main navigation Skip to search Skip to main content

CurMIM: Curriculum Masked Image Modeling

  • Hao Liu
  • , Kun Wang
  • , Yudong Han
  • , Haocong Wang
  • , Yupeng Hu*
  • , Chunxiao Wang
  • , Liqiang Nie
  • *Corresponding author for this work
  • Shandong University
  • Qilu University of Technology
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Masked Image Modeling (MIM), following “mask-and-reconstruct” scheme, is a promising self-supervised method to learn scalable visual representation. Studies indicate that selecting an effective mask strategy is vital for MIM. However, existing approaches often rely on static pre-defined priors, which limit their ability to adapt mask strategies dynamically for network optimization. In this paper, we focus on the learning process of the network and introduce human-like curriculum into MIM for dynamic representation refinement, and propose an end-to-end framework Curriculum Masked Image Modeling (CurMIM). CurMIM consists of two components: Mask Priority Measurer, which acts as a curriculum learner to determine mask priority values using the network's intrinsic state information, and Dual Adaptive Selector, which serves as a curriculum scheduler to create effective masks based on these values. With negligible extra parameters, our curriculum-based method consistently establishes noticeable improvements across varying model sizes and benchmarks, showing effectiveness and generalization.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6 Apr 202511 Apr 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period6/04/2511/04/25

Keywords

  • Curriculum Learning
  • Image Representation
  • Masked Image Modeling
  • Visual Pre-training

Fingerprint

Dive into the research topics of 'CurMIM: Curriculum Masked Image Modeling'. Together they form a unique fingerprint.

Cite this