Abstract
Current continual semantic segmentation methods face two critical limitations: architectural rigidities often leading to catastrophic forgetting, and suboptimal data replay mechanisms failing to fully provide supervision from old classes. To address these, we propose ConSept, a Vision Transformer-based continual semantic segmentation framework utilizing visual adapters. From architecture perspective, within the simplified architecture of ViT with linear segmentation head , ConSept integrates lightweight attention-based adapters into vanilla ViTs. For the data perspective, instead of replaying entire old-class exemplars, we propose old instance replay, which improves old-new class discrimination by directly inserting old-class instances into new training images. Furthermore, we adopt a tailored training strategy that combines distillation with a deterministic old-class boundary and dual Dice losses to further strengthen segmentation performance. Extensive experiments on multiple benchmarks under both overlapped and disjoint settings show that ConSept achieves competitive performance compared to state-of-the-art methods, offering a promising solution for continual semantic segmentation.
| Original language | English |
|---|---|
| Pages (from-to) | 64-71 |
| Number of pages | 8 |
| Journal | Pattern Recognition Letters |
| Volume | 204 |
| DOIs | |
| State | Published - Jun 2026 |
Keywords
- Continual learning
- Semantic segmentation
- Vision transformer
- Visual adapter
Fingerprint
Dive into the research topics of 'ConSept: Continual semantic segmentation via adapter-based vision transformer'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver