Skip to main navigation Skip to search Skip to main content

ConSept: Continual semantic segmentation via adapter-based vision transformer

  • Bowen Dong
  • , Guanglei Yang
  • , Lei Zhang
  • , Wangmeng Zuo*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Hong Kong Polytechnic University

Research output: Contribution to journalArticlepeer-review

Abstract

Current continual semantic segmentation methods face two critical limitations: architectural rigidities often leading to catastrophic forgetting, and suboptimal data replay mechanisms failing to fully provide supervision from old classes. To address these, we propose ConSept, a Vision Transformer-based continual semantic segmentation framework utilizing visual adapters. From architecture perspective, within the simplified architecture of ViT with linear segmentation head , ConSept integrates lightweight attention-based adapters into vanilla ViTs. For the data perspective, instead of replaying entire old-class exemplars, we propose old instance replay, which improves old-new class discrimination by directly inserting old-class instances into new training images. Furthermore, we adopt a tailored training strategy that combines distillation with a deterministic old-class boundary and dual Dice losses to further strengthen segmentation performance. Extensive experiments on multiple benchmarks under both overlapped and disjoint settings show that ConSept achieves competitive performance compared to state-of-the-art methods, offering a promising solution for continual semantic segmentation.

Original languageEnglish
Pages (from-to)64-71
Number of pages8
JournalPattern Recognition Letters
Volume204
DOIs
StatePublished - Jun 2026

Keywords

  • Continual learning
  • Semantic segmentation
  • Vision transformer
  • Visual adapter

Fingerprint

Dive into the research topics of 'ConSept: Continual semantic segmentation via adapter-based vision transformer'. Together they form a unique fingerprint.

Cite this