TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Object Tracking	GOT-10k	MixFormer-1k	Average Overlap	71.2	# 15
Visual Object Tracking	GOT-10k	MixFormer-1k	Success Rate 0.5	79.9	# 12
Visual Object Tracking	GOT-10k	MixFormer-1k	Success Rate 0.75	65.8	# 11
Visual Object Tracking	GOT-10k	MixFormer-L	Average Overlap	75.6	# 10
Visual Object Tracking	GOT-10k	MixFormer-L	Success Rate 0.5	85.73	# 4
Visual Object Tracking	GOT-10k	MixFormer-L	Success Rate 0.75	72.8	# 6
Visual Object Tracking	GOT-10k	MixFormer	Average Overlap	70.7	# 17
Visual Object Tracking	GOT-10k	MixFormer	Success Rate 0.5	80.0	# 10
Visual Object Tracking	GOT-10k	MixFormer	Success Rate 0.75	67.8	# 10
Visual Object Tracking	LaSOT	MixFormer-L	AUC	70.1	# 19
Visual Object Tracking	LaSOT	MixFormer-L	Normalized Precision	79.9	# 12
Visual Object Tracking	LaSOT	MixFormer-L	Precision	76.3	# 11
Visual Object Tracking	TrackingNet	MixFormer-L	Precision	83.1	# 10
Visual Object Tracking	TrackingNet	MixFormer-L	Normalized Precision	88.9	# 7
Visual Object Tracking	TrackingNet	MixFormer-L	Accuracy	83.9	# 10
Visual Object Tracking	UAV123	MixFormer	AUC	0.704	# 8
Visual Object Tracking	UAV123	MixFormer	Precision	0.918	# 2
Semi-Supervised Video Object Segmentation	VOT2020	MixFormer-L	EAO	0.555	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixformer-end-to-end-tracking-with-iterative-1/visual-object-tracking-on-uav123)](https://paperswithcode.com/sota/visual-object-tracking-on-uav123?p=mixformer-end-to-end-tracking-with-iterative-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixformer-end-to-end-tracking-with-iterative-1/visual-object-tracking-on-got-10k)](https://paperswithcode.com/sota/visual-object-tracking-on-got-10k?p=mixformer-end-to-end-tracking-with-iterative-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixformer-end-to-end-tracking-with-iterative-1/visual-object-tracking-on-trackingnet)](https://paperswithcode.com/sota/visual-object-tracking-on-trackingnet?p=mixformer-end-to-end-tracking-with-iterative-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixformer-end-to-end-tracking-with-iterative-1/semi-supervised-video-object-segmentation-on-15)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-15?p=mixformer-end-to-end-tracking-with-iterative-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixformer-end-to-end-tracking-with-iterative-1/visual-object-tracking-on-lasot)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot?p=mixformer-end-to-end-tracking-with-iterative-1)`

MixFormer: End-to-End Tracking with Iterative Mixed Attention

CVPR 2022 · Yutao Cui, Cheng Jiang, LiMin Wang, Gangshan Wu ·

Tracking often uses a multi-stage pipeline of feature extraction, target information integration, and bounding box estimation. To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers. Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration. This synchronous modeling scheme allows to extract target-specific discriminative features and perform extensive communication between target and search area. Based on MAM, we build our MixFormer tracking framework simply by stacking multiple MAMs with progressive patch embedding and placing a localization head on top. In addition, to handle multiple target templates during online tracking, we devise an asymmetric attention scheme in MAM to reduce computational cost, and propose an effective score prediction module to select high-quality templates. Our MixFormer sets a new state-of-the-art performance on five tracking benchmarks, including LaSOT, TrackingNet, VOT2020, GOT-10k, and UAV123. In particular, our MixFormer-L achieves NP score of 79.9% on LaSOT, 88.9% on TrackingNet and EAO of 0.555 on VOT2020. We also perform in-depth ablation studies to demonstrate the effectiveness of simultaneous feature extraction and information integration. Code and trained models are publicly available at https://github.com/MCG-NJU/MixFormer.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

MCG-NJU/MixFormer official

421

Tasks

Add Remove

Semi-Supervised Video Object Segmentation

Visual Object Tracking

Datasets

ImageNet

MS COCO

OTB

LaSOT

GOT-10k

TrackingNet VOTChallenge UAV123

VOT2020

Results from the Paper

Edit

Ranked #6 on Visual Object Tracking on UAV123

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Object Tracking	GOT-10k	MixFormer-1k	Average Overlap	71.2	# 15	Compare
			Success Rate 0.5	79.9	# 12	Compare
			Success Rate 0.75	65.8	# 11	Compare
Visual Object Tracking	GOT-10k	MixFormer-L	Average Overlap	75.6	# 10	Compare
			Success Rate 0.5	85.73	# 4	Compare
			Success Rate 0.75	72.8	# 6	Compare
Visual Object Tracking	GOT-10k	MixFormer	Average Overlap	70.7	# 17	Compare
			Success Rate 0.5	80.0	# 10	Compare
			Success Rate 0.75	67.8	# 10	Compare
Visual Object Tracking	LaSOT	MixFormer-L	AUC	70.1	# 19	Compare
			Normalized Precision	79.9	# 12	Compare
			Precision	76.3	# 11	Compare
Visual Object Tracking	TrackingNet	MixFormer-L	Precision	83.1	# 10	Compare
			Normalized Precision	88.9	# 7	Compare
			Accuracy	83.9	# 10	Compare
Visual Object Tracking	UAV123	MixFormer	AUC	0.704	# 8	Compare
Visual Object Tracking	UAV123	MixFormer	Precision	0.918	# 2	Compare
Semi-Supervised Video Object Segmentation	VOT2020	MixFormer-L	EAO	0.555	# 11	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MixFormer: End-to-End Tracking with Iterative Mixed Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove