9 dataset results for face recog AND Speech

A dataset for voice and 3D face structure study. It contains about 1.4K identities with their 3D face models and voice data. 3D face models are fitted from VGGFace using BFM 3D models, and voice data are processed from Voxceleb

1 PAPER • 1 BENCHMARK

Talking With Hands 16.2M

This is a 16.2-million frame (50-hour) multimodal dataset of two-person face-to-face spontaneous conversations. This dataset features synchronized body and finger motion as well as audio data.

4 PAPERS • NO BENCHMARKS YET

AVSpeech

…The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.

36 PAPERS • NO BENCHMARKS YET

VOCASET

VOCASET is a 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio.

43 PAPERS • 1 BENCHMARK

RESD (Russian Emotional Speech Dialogs with annotated text)

…Аментес, Илья Лубенец, Никита Давидчук}, title = {Открытая библиотека искусственного интеллекта для анализа и выявления эмоциональных оттенков речи человека}, year = {2022}, publisher = {Hugging Face }, journal = {Hugging Face Hub}, howpublished = {\url{https://huggingface.com/aniemore/Aniemore}}, email = {hello@socialcode.ru} }

0 PAPER • NO BENCHMARKS YET

GOTCHA

…The images inside each zip file are face-only. We provide three convenient sizes: 224 x 224, 512 x 512, and 1024 x 1024 pixels. The participant covers their eyes with a hand, followed by covering the left half, the right half, and finally, the lower half of the face. The participant moves a green cloth in front of their face The participant puts on a face mask and counts from 1 to 10 out loud. Then, they remove the facemask. FSGAN (Face Swapping Generative Adversarial Network): This corresponds to the second version of FSGAN. Access its release at https://github.com/wyhsirius/LIA As a rule of thumb, An imposter outer face and target is the inner face, in case of faceswaps.

1 PAPER • NO BENCHMARKS YET

EasyCom

…contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head and face

15 PAPERS • 4 BENCHMARKS

MSNER

MSNER (Multilingual Spoken Named Entity Recognition)

…The test set is available on HuggingFace in BIO format: qmeeus/MSNER @inproceedings{MSNER, author = {Meeus, Quentin and Moens, Marie-Francine and Van hamme, Hugo}, booktitle = {20th Joint ACL-ISO Workshop

1 PAPER • NO BENCHMARKS YET

nEMO

…The dataset is available on Hugging Face and GitHub. Data Fields file_id - filename, i.e.

1 PAPER • NO BENCHMARKS YET

Datasets

9 dataset results for face recog AND Speech