TY - GEN
T1 - ODTrack
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
AU - Zheng, Yaozong
AU - Zhong, Bineng
AU - Liang, Qihua
AU - Mo, Zhiyi
AU - Zhang, Shengping
AU - Li, Xianxian
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org).All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking.However, most current top-performing trackers persistently lean on sparse temporal relationships between reference and search frames via an offline mode.Consequently, they can only interact independently within each image-pair and establish limited temporal correlations.To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named ODTrack, which densely associates the contextual relationships of video frames in an online token propagation manner.ODTrack receives video frames of arbitrary length to capture the spatio-temporal trajectory relationships of an instance, and compresses the discrimination features (localization information) of a target into a token sequence to achieve frame-to-frame association.This new solution brings the following benefits: 1) the purified token sequences can serve as prompts for the inference in the next video frame, whereby past information is leveraged to guide future inference; 2) the complex online update strategies are effectively avoided by the iterative propagation of token sequences, and thus we can achieve more efficient model representation and computation.ODTrack achieves a new SOTA performance on seven benchmarks, while running at real-time speed.Code and models are available at https://github.com/GXNU-ZhongLab/ODTrack.
AB - Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking.However, most current top-performing trackers persistently lean on sparse temporal relationships between reference and search frames via an offline mode.Consequently, they can only interact independently within each image-pair and establish limited temporal correlations.To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named ODTrack, which densely associates the contextual relationships of video frames in an online token propagation manner.ODTrack receives video frames of arbitrary length to capture the spatio-temporal trajectory relationships of an instance, and compresses the discrimination features (localization information) of a target into a token sequence to achieve frame-to-frame association.This new solution brings the following benefits: 1) the purified token sequences can serve as prompts for the inference in the next video frame, whereby past information is leveraged to guide future inference; 2) the complex online update strategies are effectively avoided by the iterative propagation of token sequences, and thus we can achieve more efficient model representation and computation.ODTrack achieves a new SOTA performance on seven benchmarks, while running at real-time speed.Code and models are available at https://github.com/GXNU-ZhongLab/ODTrack.
UR - https://www.scopus.com/pages/publications/85189534010
U2 - 10.1609/aaai.v38i7.28591
DO - 10.1609/aaai.v38i7.28591
M3 - 会议稿件
AN - SCOPUS:85189534010
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 7588
EP - 7596
BT - Technical Tracks 14
A2 - Wooldridge, Michael
A2 - Dy, Jennifer
A2 - Natarajan, Sriraam
PB - Association for the Advancement of Artificial Intelligence
Y2 - 20 February 2024 through 27 February 2024
ER -