3 dataset results for face recog AND Self-Supervised Learning

Contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible. This dataset contains about 3.65 million human labeled frames or about 38.5 hours of face tracks, and the corresponding audio.

19 PAPERS • 1 BENCHMARK

YFCC-CelebA

…The existence of such large weak-labeled databases has gained importance in the training of face recognition algorithms. Starting with the publicly available YFCC100M, we propose a weakly-labeled subset for multi-label face recognition for self-supervised methods.

1 PAPER • NO BENCHMARKS YET

AVA (Atomic Visual Actions)

…AVA ActiveSpeaker: associates speaking activity with a visible face, on the AVA v1.0 videos, resulting in 3.65 million frames labeled across ~39K face tracks.

94 PAPERS • 7 BENCHMARKS

Datasets

3 dataset results for face recog AND Self-Supervised Learning