Abstract
Salient Object Detection (SOD) aims to identify the most visually distinctive objects in images, with broad applications in object detection, image classification, and image synthesis. Most existing SOD methods adopt supervised learning frameworks that heavily rely on labeled images as supervision signals. However, these methods often underperform in complex scenarios where camouflaged objects and backgrounds exhibit high similarity, primarily due to two limitations: (1) Insufficient supervision from labels fails to capture holistic salient regions, and (2) Task-driven supervised learning overly focuses on target objects while neglecting contextual receptive fields, resulting in elevated false-positive rates. To address these challenges, we propose a novel hybrid model, SCL-SOD, that integrates self-supervised contrastive representation learning with supervised learning in an encoder-decoder architecture with a T2T-ViT backbone. Specifically, our model has two key components: Image-wise Contrastive Learning Encoder (ICLE) that enhances global feature discriminability by learning invariant representations across different augmented views; Pixel-wise Contrastive Learning Decoder (PCLD) that refines local prediction accuracy by enforcing feature consistency at the pixel level. The final optimization combines the weighted supervised detection loss and the self-supervised contrastive loss. Extensive experiments on six standard RGB benchmarks across five evaluation metrics demonstrate that our proposed SCL-SOD model outperforms 11 state-of-the-art SOD methods, particularly in challenging scenarios with cluttered backgrounds.
| Original language | English |
|---|---|
| Article number | 132889 |
| Journal | Neurocomputing |
| Volume | 674 |
| DOIs | |
| State | Published - 14 Apr 2026 |
| Externally published | Yes |
Keywords
- Contrastive learning
- Salient object detection
- Self-supervised learning
- Transformer
Fingerprint
Dive into the research topics of 'SCL-SOD: A hybrid self-supervised contrastive learning framework for salient object detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver