Abstract
Objects often have different appearances because of viewpoint changes or part deformation. How to reasonably model these variations is still a big challenge for object detection. In this paper, we propose a novel Deformable Template Network (DTN), which exploits the pictorial structure to model possible variations of an object. DTN represents an object by virtue of a generated template in a deformable way. It has two key modules: The template generating module and the part matching module. The template generating module produces a template for a given object which defines the anchor positions of the $k{\times }k$ parts. Based on such a template, the part matching module aims to perform part alignment around the anchor positions. In terms of each part, the matching process makes a trade-off between maximizing the detection score and minimizing the deformation cost relative to the anchor position. Moreover, DTN is a fully convolutional network which means it is competitive in terms of detection efficiency. We evaluate DTN on both the PASCAL VOC and MSCOCO datasets, achieving the state-of-The-Art results, an accuracy of 82.7% for PASCAL VOC and of 44.9% for MSCOCO.
| Original language | English |
|---|---|
| Pages (from-to) | 2058-2068 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Multimedia |
| Volume | 24 |
| DOIs | |
| State | Published - 2022 |
| Externally published | Yes |
Keywords
- deformable template
- deformation cost
- object detection
- part matching
Fingerprint
Dive into the research topics of 'Deformable Template Network (DTN) for Object Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver