TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	EPIC-KITCHENS-100	Mformer-L	Action@1	44.1	# 19
Action Recognition	EPIC-KITCHENS-100	Mformer-L	Verb@1	67.1	# 17
Action Recognition	EPIC-KITCHENS-100	Mformer-L	Noun@1	57.6	# 15
Action Recognition	EPIC-KITCHENS-100	Mformer-HR	Action@1	44.5	# 15
Action Recognition	EPIC-KITCHENS-100	Mformer-HR	Verb@1	67.0	# 19
Action Recognition	EPIC-KITCHENS-100	Mformer-HR	Noun@1	58.5	# 13
Action Recognition	EPIC-KITCHENS-100	Mformer	Action@1	43.1	# 22
Action Recognition	EPIC-KITCHENS-100	Mformer	Verb@1	66.7	# 20
Action Recognition	EPIC-KITCHENS-100	Mformer	Noun@1	56.5	# 18
Action Classification	Kinetics-400	Motionformer-HR	Acc@1	81.1	# 81
Action Classification	Kinetics-400	Motionformer-HR	Acc@5	95.2	# 50
Action Recognition	Something-Something V2	Mformer-HR	Top-1 Accuracy	67.1	# 69
Action Recognition	Something-Something V2	Mformer-HR	Top-5 Accuracy	90.6	# 51
Action Recognition	Something-Something V2	Mformer-HR	Parameters	N/A	# 37
Action Recognition	Something-Something V2	Mformer-HR	GFLOPs	958.8x3	# 6
Action Recognition	Something-Something V2	Mformer-L	Top-1 Accuracy	68.1	# 53
Action Recognition	Something-Something V2	Mformer-L	Top-5 Accuracy	91.2	# 41
Action Recognition	Something-Something V2	Mformer-L	Parameters	N/A	# 37
Action Recognition	Something-Something V2	Mformer-L	GFLOPs	1181x3	# 6
Action Recognition	Something-Something V2	Mformer	Top-1 Accuracy	66.5	# 77
Action Recognition	Something-Something V2	Mformer	Top-5 Accuracy	90.1	# 62

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/keeping-your-eye-on-the-ball-trajectory/action-recognition-on-epic-kitchens-100)](https://paperswithcode.com/sota/action-recognition-on-epic-kitchens-100?p=keeping-your-eye-on-the-ball-trajectory)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/keeping-your-eye-on-the-ball-trajectory/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=keeping-your-eye-on-the-ball-trajectory)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/keeping-your-eye-on-the-ball-trajectory/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=keeping-your-eye-on-the-ball-trajectory)`

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

NeurIPS 2021 · Mandela Patrick, Dylan Campbell, Yuki M. Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, João F. Henriques ·

In video transformers, the time dimension is often treated in the same way as the two spatial dimensions. However, in a scene where objects or the camera may move, a physical point imaged at one location in frame $t$ may be entirely unrelated to what is found at that location in frame $t+k$. These temporal correspondences should be modeled to facilitate learning about dynamic scenes. To this end, we propose a new drop-in block for video transformers -- trajectory attention -- that aggregates information along implicitly determined motion paths. We additionally propose a new method to address the quadratic dependence of computation and memory on the input size, which is particularly important for high resolution or long videos. While these ideas are useful in a range of settings, we apply them to the specific task of video action recognition with a transformer model and obtain state-of-the-art results on the Kinetics, Something--Something V2, and Epic-Kitchens datasets. Code and models are available at: https://github.com/facebookresearch/Motionformer

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

facebookresearch/Motionformer official

220

facebookresearch/xformers

↳ Quickstart in

Colab

7,560

Tasks

Add Remove

Action Classification

Action Recognition

Temporal Action Localization

Datasets

ImageNet

Kinetics

Kinetics 400

Something-Something V2

EPIC-KITCHENS-100

Results from the Paper

Edit

Ranked #15 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	EPIC-KITCHENS-100	Mformer-L	Action@1	44.1	# 19	Compare
			Verb@1	67.1	# 17	Compare
			Noun@1	57.6	# 15	Compare
Action Recognition	EPIC-KITCHENS-100	Mformer-HR	Action@1	44.5	# 15	Compare
			Verb@1	67.0	# 19	Compare
			Noun@1	58.5	# 13	Compare
Action Recognition	EPIC-KITCHENS-100	Mformer	Action@1	43.1	# 22	Compare
			Verb@1	66.7	# 20	Compare
			Noun@1	56.5	# 18	Compare
Action Classification	Kinetics-400	Motionformer-HR	Acc@1	81.1	# 81	Compare
Action Classification	Kinetics-400	Motionformer-HR	Acc@5	95.2	# 50	Compare
Action Recognition	Something-Something V2	Mformer-HR	Top-1 Accuracy	67.1	# 69	Compare
			Top-5 Accuracy	90.6	# 51	Compare
			Parameters	N/A	# 37	Compare
			GFLOPs	958.8x3	# 6	Compare
Action Recognition	Something-Something V2	Mformer-L	Top-1 Accuracy	68.1	# 53	Compare
			Top-5 Accuracy	91.2	# 41	Compare
			Parameters	N/A	# 37	Compare
			GFLOPs	1181x3	# 6	Compare
Action Recognition	Something-Something V2	Mformer	Top-1 Accuracy	66.5	# 77	Compare
Action Recognition	Something-Something V2	Mformer	Top-5 Accuracy	90.1	# 62	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove