…The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
35 PAPERS • NO BENCHMARKS YET
…The data includes RGB, depth, foreground segmentations and full body skeletons. In this dataset, both the training and testing labels are noisy (from Kinect).
0 PAPER • NO BENCHMARKS YET
…It contains over 10M segments of multilingual open data. The data has been collected from sites allowing free use and reuse of its content, as well as from Public Sector web sites.
2 PAPERS • NO BENCHMARKS YET