AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

CVPR 2018 Chunhui Gu • Chen Sun • David A. Ross • Carl Vondrick • Caroline Pantofaru • Yeqing Li • Sudheendra Vijayanarasimhan • George Toderici • Susanna Ricco • Rahul Sukthankar • Cordelia Schmid • Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. We will release the dataset publicly.

Full paper


No evaluation results yet. Help compare this paper to other papers by submitting the tasks and evaluation metrics from the paper.