UDIVA is a new non-acted dataset of face-to-face dyadic interactions, where interlocutors perform competitive and collaborative tasks with different behavior elicitation and cognitive workload.
6 PAPERS • NO BENCHMARKS YET
…The dataset is audio-visual, so is also useful for a number of other applications, for example – visual speech synthesis, speech separation, cross-modal transfer from face to voice or vice versa and training face recognition from video to complement existing face recognition datasets.
496 PAPERS • 5 BENCHMARKS
…The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
35 PAPERS • NO BENCHMARKS YET
…Аментес, Илья Лубенец, Никита Давидчук}, title = {Открытая библиотека искусственного интеллекта для анализа и выявления эмоциональных оттенков речи человека}, year = {2022}, publisher = {Hugging Face }, journal = {Hugging Face Hub}, howpublished = {\url{https://huggingface.com/aniemore/Aniemore}}, email = {hello@socialcode.ru} }
0 PAPER • NO BENCHMARKS YET
…covers the user manipulation and interaction with the robot An RGB-d camera mounted on the top of the robot provides a top view of the whole scenario A HD-RGB camera points to the user head to capture face
…contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head and face
15 PAPERS • 4 BENCHMARKS
…The dense dynamic face scans were acquired at 25 frames per second and the RMS error in the 3D reconstruction is about 0.5 mm.
5 PAPERS • 1 BENCHMARK
The German Lipreading dataset consists of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline
5 PAPERS • NO BENCHMARKS YET
…The dataset is available on Zenodo and connectors ara available for the HuggingFace Hub.
2 PAPERS • NO BENCHMARKS YET
Blind Audio-Visual Localization (BAVL) Dataset consists of 20 audio-visual recordings of sound sources, which could be talking faces or music instruments.
…objective of the 2016 challenge was to better understand different VC techniques built on a freely-available common dataset to look at a common goal, and to share views about unsolved problems and challenges faced
3 PAPERS • NO BENCHMARKS YET
…The dataset is available on Hugging Face and GitHub. Data Fields file_id - filename, i.e.
1 PAPER • NO BENCHMARKS YET