Skip to main navigation Skip to search Skip to main content

MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection

  • Guiping Cao
  • , Wenjian Huang
  • , Xiangyuan Lan*
  • , Jianguo Zhang*
  • , Dongmei Jiang
  • , Yaowei Wang
  • *Corresponding author for this work
  • Southern University of Science and Technology
  • Peng Cheng Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Popular transformer-based detectors detect objects in a one-to-one manner, where both the bounding box and category of each object are predicted only by the single query, leading to the box-sensitive category predictions. Additionally, the initialization of positional queries solely based on the predicted confidence scores or learnable embeddings neglects the significant spatial interrelation between different queries. This oversight leads to an imbalanced spatial distribution of queries (SDQ). In this paper, we propose a new MLP-DINO model to address these issues. Firstly, we present a new Query-Independent Category Supervision (QICS) approach for modeling categories information, decoupling the sensitive bounding box prediction process. Additionally, to further improve the category predictions, we introduce a deep MLP model into transformer-based detection framework to capture the long-range and short-range information simultaneously. Thirdly, to balance the SDQ, we design a novel Graph-based Query Selection (GQS) method that distributes each query point in a discrete manner by graphing the spatial information of queries to cover a broader range of potential objects, significantly enhancing the hit-rate of queries. Experimental results on COCO indicate that our MLP-DINO achieves 54.6% AP with only 44M parameters under 36-epoch setting, greatly outperforming the DINO by +3.7% AP with fewer parameters and FLOPs. The source codes will be available at https://github.com/Med-Process/MLP-DINO.

Original languageEnglish
Title of host publicationProceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
EditorsKate Larson
PublisherInternational Joint Conferences on Artificial Intelligence
Pages605-613
Number of pages9
ISBN (Electronic)9781956792041
StatePublished - 2024
Externally publishedYes
Event33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 - Jeju, Korea, Republic of
Duration: 3 Aug 20249 Aug 2024

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference33rd International Joint Conference on Artificial Intelligence, IJCAI 2024
Country/TerritoryKorea, Republic of
CityJeju
Period3/08/249/08/24

Fingerprint

Dive into the research topics of 'MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection'. Together they form a unique fingerprint.

Cite this