TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Tracking	DAVIS	TAPIR (MOVi-E)	Average Jaccard	59.8	# 2
Visual Tracking	DAVIS	TAPIR (Panning MOVi-E)	Average Jaccard	61.3	# 1
Visual Tracking	Kinetics	TAPIR (Panning MOVi-E)	Average Jaccard	57.2	# 1
Visual Tracking	Kinetics	TAPIR (MOVi-E)	Average Jaccard	57.1	# 2
Visual Tracking	Kubric	TAPIR (Panning MOVi-E)	Average Jaccard	84.7	# 1
Visual Tracking	Kubric	TAPIR (MOVi-E)	Average Jaccard	84.3	# 2
Visual Tracking	RGB-Stacking	TAPIR (Panning MOVi-E)	Average Jaccard	62.7	# 2
Visual Tracking	RGB-Stacking	TAPIR (MOVi-E)	Average Jaccard	66.2	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tapir-tracking-any-point-with-per-frame/visual-tracking-on-davis)](https://paperswithcode.com/sota/visual-tracking-on-davis?p=tapir-tracking-any-point-with-per-frame)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tapir-tracking-any-point-with-per-frame/visual-tracking-on-kinetics)](https://paperswithcode.com/sota/visual-tracking-on-kinetics?p=tapir-tracking-any-point-with-per-frame)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tapir-tracking-any-point-with-per-frame/visual-tracking-on-kubric)](https://paperswithcode.com/sota/visual-tracking-on-kubric?p=tapir-tracking-any-point-with-per-frame)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tapir-tracking-any-point-with-per-frame/visual-tracking-on-rgb-stacking)](https://paperswithcode.com/sota/visual-tracking-on-rgb-stacking?p=tapir-tracking-any-point-with-per-frame)`

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

ICCV 2023 · Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, Joao Carreira, Andrew Zisserman ·

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time, and can be flexibly extended to higher-resolution videos. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found on our project webpage.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

deepmind/tapnet official

↳ Quickstart in

Colab

1,052

Tasks

Add Remove

Motion Estimation

Visual Tracking

Datasets

Kinetics

DAVIS

Kubric

TAP-Vid

RGB-Stacking

Results from the Paper

Add Remove

Ranked #1 on Visual Tracking on Kinetics

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Tracking	DAVIS	TAPIR (MOVi-E)	Average Jaccard	59.8	# 2	Compare
Visual Tracking	DAVIS	TAPIR (Panning MOVi-E)	Average Jaccard	61.3	# 1	Compare
Visual Tracking	Kinetics	TAPIR (Panning MOVi-E)	Average Jaccard	57.2	# 1	Compare
Visual Tracking	Kinetics	TAPIR (MOVi-E)	Average Jaccard	57.1	# 2	Compare
Visual Tracking	Kubric	TAPIR (Panning MOVi-E)	Average Jaccard	84.7	# 1	Compare
Visual Tracking	Kubric	TAPIR (MOVi-E)	Average Jaccard	84.3	# 2	Compare
Visual Tracking	RGB-Stacking	TAPIR (Panning MOVi-E)	Average Jaccard	62.7	# 2	Compare
Visual Tracking	RGB-Stacking	TAPIR (MOVi-E)	Average Jaccard	66.2	# 1	Compare

Methods

Add Remove

Depthwise Convolution • Diffusion

Edit Social Preview

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove