Skip to main navigation Skip to search Skip to main content

Cascaded Tracking via Pyramid Dense Capsules

  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The tracking-by-detection is a two-stage framework including, collecting the candidates around the target object and classifying each candidate as the target object or as background. Despite Convolutional Neural Networks (CNNs) based methods have been successful in tracking-by-detection framework, the own set of flaws of CNNs will still affect the performance. The underlying mechanism of CNNs that are based on the positional invariance (i.e., lose the spatial relationships between features) cannot capture the small affine transformations. This would ultimately result in drift. To solve this problem, we dig into spatial relationships endowed by the Capsule Networks (CapsNets) for tracking-by-detection framework. To strengthen the encoded power of convolutional capsules, we generate the convolutional capsules through a pyramid dense capsules (PDCaps) architecture. Our pyramid dense capsule representation is useful in producing comprehensive spatial relationships within the input. Besides, the critical challenges in the tracking-by-detection framework are how to avoid overfitting and mismatch during training and inference, where a reasonable intersection over union (IoU) threshold that defines the true/false positives is hard to set. To address the issue of the IoU threshold setting, a cascaded PDCaps model is proposed to improve the quality of candidates, and it consists of a sequential PDCaps model trained with increasing IoU thresholds to improve the quality of candidates sequentially. Extensive experiments demonstrate that our tracker performs favorably against state-of-the-art approaches.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020 Workshops, Proceedings
EditorsAdrien Bartoli, Andrea Fusiello
PublisherSpringer Science and Business Media Deutschland GmbH
Pages683-696
Number of pages14
ISBN (Print)9783030682378
DOIs
StatePublished - 2020
Externally publishedYes
EventWorkshops held at the 16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12539 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceWorkshops held at the 16th European Conference on Computer Vision, ECCV 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20

Keywords

  • Cascaded architecture
  • Pyramid and dense capsules
  • Visual tracking

Fingerprint

Dive into the research topics of 'Cascaded Tracking via Pyramid Dense Capsules'. Together they form a unique fingerprint.

Cite this