Skip to main navigation Skip to search Skip to main content

Layer-Specific Knowledge Distillation for Class Incremental Semantic Segmentation

  • Qilong Wang
  • , Yiwen Wu
  • , Liu Yang
  • , Wangmeng Zuo
  • , Qinghua Hu*
  • *Corresponding author for this work
  • Tianjin University
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Ministry of Education of the People's Republic of China

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, class incremental semantic segmentation (CISS) towards the practical open-world setting has attracted increasing research interest, which is mainly challenged by the well-known issue of catastrophic forgetting. Particularly, knowledge distillation (KD) techniques have been widely studied to alleviate catastrophic forgetting. Despite the promising performance, existing KD-based methods generally use the same distillation schemes for different intermediate layers to transfer old knowledge, while employing manually tuned and fixed trade-off weights to control the effect of KD. These KD-based methods take no consideration of feature characteristics from different intermediate layers, limiting the effectiveness of KD for CISS. In this paper, we propose a layer-specific knowledge distillation (LSKD) method to assign appropriate knowledge schemes and weights for various intermediate layers by considering feature characteristics, aiming to further explore the potential of KD in improving the performance of CISS. Specifically, we present a mask-guided distillation (MD) to alleviate the background shift on semantic features, which performs distillation by masking the features affected by the background. Furthermore, a mask-guided context distillation (MCD) is presented to explore global context information lying in high-level semantic features. Based on them, our LSKD assigns different distillation schemes according to feature characteristics. To adjust the effect of layer-specific distillation adaptively, LSKD introduces a regularized gradient equilibrium method to learn dynamic trade-off weights. Additionally, our LSKD makes an attempt to simultaneously learn distillation schemes and trade-off weights of different layers by developing a bi-level optimization method. Extensive experiments on widely used Pascal VOC 12 and ADE20K show our LSKD clearly outperforms its counterparts while achieving state-of-the-art results.

Original languageEnglish
Pages (from-to)1977-1989
Number of pages13
JournalIEEE Transactions on Image Processing
Volume33
DOIs
StatePublished - 2024
Externally publishedYes

Keywords

  • Knowledge distillation
  • incremental learning
  • semantic segmentation

Fingerprint

Dive into the research topics of 'Layer-Specific Knowledge Distillation for Class Incremental Semantic Segmentation'. Together they form a unique fingerprint.

Cite this