TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Anticipation	EPIC-KITCHENS-100	AVT+	Recall@5	15.9	# 4
Action Anticipation	EPIC-KITCHENS-100 (test)	AVT++	recall@5	16.7	# 2
Action Anticipation	EPIC-KITCHENS-100 (test)	AVT+	recall@5	12.6	# 5
Action Anticipation	EPIC-KITCHENS-55 (Seen test set (S1))	AVT+	Top 1 Accuracy - Verb	34.36	# 3
Action Anticipation	EPIC-KITCHENS-55 (Seen test set (S1))	AVT+	Top 1 Accuracy - Noun	20.16	# 4
Action Anticipation	EPIC-KITCHENS-55 (Seen test set (S1))	AVT+	Top 1 Accuracy - Act.	16.84	# 2
Action Anticipation	EPIC-KITCHENS-55 (Seen test set (S1))	AVT+	Top 5 Accuracy - Verb	80.03	# 2
Action Anticipation	EPIC-KITCHENS-55 (Seen test set (S1))	AVT+	Top 5 Accuracy - Noun	51.57	# 3
Action Anticipation	EPIC-KITCHENS-55 (Seen test set (S1))	AVT+	Top 5 Accuracy - Act.	36.52	# 2
Action Anticipation	EPIC-KITCHENS-55 (Unseen test set (S2)	AVT+	Top 1 Accuracy - Verb	30.66	# 2
Action Anticipation	EPIC-KITCHENS-55 (Unseen test set (S2)	AVT+	Top 1 Accuracy - Noun	15.64	# 2
Action Anticipation	EPIC-KITCHENS-55 (Unseen test set (S2)	AVT+	Top 1 Accuracy - Act.	10.41	# 2
Action Anticipation	EPIC-KITCHENS-55 (Unseen test set (S2)	AVT+	Top 5 Accuracy - Verb	72.17	# 2
Action Anticipation	EPIC-KITCHENS-55 (Unseen test set (S2)	AVT+	Top 5 Accuracy - Noun	40.76	# 2
Action Anticipation	EPIC-KITCHENS-55 (Unseen test set (S2)	AVT+	Top 5 Accuracy - Act.	24.27	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/anticipative-video-transformer/action-anticipation-on-epic-kitchens-100-test)](https://paperswithcode.com/sota/action-anticipation-on-epic-kitchens-100-test?p=anticipative-video-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/anticipative-video-transformer/action-anticipation-on-epic-kitchens-55-seen)](https://paperswithcode.com/sota/action-anticipation-on-epic-kitchens-55-seen?p=anticipative-video-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/anticipative-video-transformer/action-anticipation-on-epic-kitchens-55-1)](https://paperswithcode.com/sota/action-anticipation-on-epic-kitchens-55-1?p=anticipative-video-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/anticipative-video-transformer/action-anticipation-on-epic-kitchens-100)](https://paperswithcode.com/sota/action-anticipation-on-epic-kitchens-100?p=anticipative-video-transformer)`

Anticipative Video Transformer

ICCV 2021 · Rohit Girdhar, Kristen Grauman ·

We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions. We train the model jointly to predict the next action in a video sequence, while also learning frame feature encoders that are predictive of successive future frames' features. Compared to existing temporal aggregation strategies, AVT has the advantage of both maintaining the sequential progression of observed actions while still capturing long-range dependencies--both critical for the anticipation task. Through extensive experiments, we show that AVT obtains the best reported performance on four popular action anticipation benchmarks: EpicKitchens-55, EpicKitchens-100, EGTEA Gaze+, and 50-Salads; and it wins first place in the EpicKitchens-100 CVPR'21 challenge.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

facebookresearch/AVT official

152

Tasks

Add Remove

Action Anticipation

Datasets

EPIC-KITCHENS-100

EGTEA

Results from the Paper

Edit

Ranked #2 on Action Anticipation on EPIC-KITCHENS-100 (test) (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Anticipation	EPIC-KITCHENS-100	AVT+	Recall@5	15.9	# 4	Compare
Action Anticipation	EPIC-KITCHENS-100 (test)	AVT++	recall@5	16.7	# 2	Compare
Action Anticipation	EPIC-KITCHENS-100 (test)	AVT+	recall@5	12.6	# 5	Compare
Action Anticipation	EPIC-KITCHENS-55 (Seen test set (S1))	AVT+	Top 1 Accuracy - Verb	34.36	# 3	Compare
			Top 1 Accuracy - Noun	20.16	# 4	Compare
			Top 1 Accuracy - Act.	16.84	# 2	Compare
			Top 5 Accuracy - Verb	80.03	# 2	Compare
			Top 5 Accuracy - Noun	51.57	# 3	Compare
			Top 5 Accuracy - Act.	36.52	# 2	Compare
Action Anticipation	EPIC-KITCHENS-55 (Unseen test set (S2)	AVT+	Top 1 Accuracy - Verb	30.66	# 2	Compare
			Top 1 Accuracy - Noun	15.64	# 2	Compare
			Top 1 Accuracy - Act.	10.41	# 2	Compare
			Top 5 Accuracy - Verb	72.17	# 2	Compare
			Top 5 Accuracy - Noun	40.76	# 2	Compare
			Top 5 Accuracy - Act.	24.27	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Anticipative Video Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove