Skip to main navigation Skip to search Skip to main content

DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection

  • Guiping Cao
  • , Xiangyuan Lan*
  • , Wenjian Huang
  • , Jianguo Zhang
  • , Dongmei Jiang
  • , Yaowei Wang
  • *Corresponding author for this work
  • Southern University of Science and Technology
  • Pengcheng Laboratory
  • Guangdong Artificial Intelligence and Digital Economy Laboratory - Guangzhou
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Popular transformer detectors have achieved promising performance through query-based learning using attention mechanisms. However, the roles of existing decoder query types (e.g., content query and positional query) are still underexplored. These queries are generally predefined with a fixed number (fixed-query), which limits their flexibility. We find that the learning of these fixed-query is impaired by Recurrent Opposing in Teractions (ROT) between two attention operations: Self-Attention (query-to-query) and Cross-Attention (query-to-encoder), thereby degrading decoder efficiency. Furthermore, "query ambiguity"arises when shared-weight decoder layers are processed with both one-to-one and one-to-many label assignments during training, violating DETR's one-to-one matching principle. To address these challenges, we propose DS-Det, a more efficient detector capable of detecting a flexible number of objects in images. Specifically, we reformulate and introduce a new unified Single-Query paradigm for decoder modeling, transforming the fixed-query into flexible. Furthermore, we propose a simplified decoder framework through attention disentangled learning: locating boxes with Cross-Attention (one-to-many process), deduplicating predictions with Self-Attention (one-to-one process), addressing ''query ambiguity'' and ''ROT'' issues directly, and enhancing decoder efficiency. We further introduce a unified PoCoo loss that leverages box size priors to prioritize query learning on hard samples such as small objects. Extensive experiments across five different backbone models on COCO2017 and WiderPerson datasets demonstrate the general effectiveness and superiority of DS-Det. The source codes are available at https://github.com/Med-Process/DS-Det/.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages286-295
Number of pages10
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Externally publishedYes
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • attention disentangled learning
  • efficient decoder
  • flexible object detection
  • single-query paradigm

Fingerprint

Dive into the research topics of 'DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection'. Together they form a unique fingerprint.

Cite this