Actin Detection

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.