…Prediction Dataset, with up to 66,560 sequences containing 550,400 eye-images and respective gaze vectors, created to foster research in spatio-temporal gaze estimation and prediction approaches; and 2) Eye Segmentation Dataset, consisting of 200 sequences sampled at 5 Hz, with up to 29,500 images, of which 5% contain a semantic segmentation label, devised to encourage the use of temporal information to propagate labels
2 PAPERS • NO BENCHMARKS YET
…AVA Speech densely annotates audio-based speech activity in AVA v1.0 videos, and explicitly labels 3 background noise conditions, resulting in ~46K labeled segments spanning 45 hours of data.
93 PAPERS • 7 BENCHMARKS