EPIC-KITCHENS-100

Introduced by Damen et al. in Rescaling Egocentric Vision

This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection also enables evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on". The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Action Recognition	EPIC-KITCHENS-100	Avion
Multi-Instance Retrieval	EPIC-KITCHENS-100	Avion
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD
Action Anticipation	EPIC-KITCHENS-100	InAViT
Unsupervised Domain Adaptation	EPIC-KITCHENS-100	TranSVAE
Audio Classification	EPIC-KITCHENS-100	Audiovisual Masked Autoencoder
Open Vocabulary Action Recognition	EPIC-KITCHENS-100	OAP+AOP