Room-Across-Room (RxR) is a multilingual dataset for Vision-and-Language Navigation (VLN) for Matterport3D environments. In contrast to related datasets such as Room-to-Room (R2R), RxR is 10x larger, multilingual (English, Hindi and Telugu), with longer and more variable paths, and it includes and fine-grained visual groundings that relate each word to pixels/surfaces in the environment.
43 PAPERS • 1 BENCHMARK
Acappella comprises around 46 hours of a cappella solo singing videos sourced from YouTbe, sampled across different singers and languages. Four languages are considered: English, Spanish, Hindi and others.
5 PAPERS • NO BENCHMARKS YET
This dataset is an extension of MASAC, a multimodal, multi-party, Hindi-English code-mixed dialogue dataset compiled from the popular Indian TV show, ‘Sarabhai v/s Sarabhai’. WITS was created by augmenting MASAC with natural language explanations for each sarcastic dialogue. The dataset consists of the transcribed sarcastic dialogues from 55 episodes of the TV show, along with audio and video multimodal signals. It was designed to facilitate Sarcasm Explanation in Dialogue (SED), a novel task aimed at generating a natural language explanation for a given sarcastic dialogue, that spells out the intended irony. Each data instance in WITS is associated with a corresponding video, audio, and textual transcript where the last utterance is sarcastic in nature. All the final selected explanations contain the following attributes:
4 PAPERS • 2 BENCHMARKS
MuCo-VQA consist of large-scale (3.7M) multilingual and code-mixed VQA datasets in multiple languages: Hindi (hi), Bengali (bn), Spanish (es), German (de), French (fr) and code-mixed language pairs: en-hi, en-bn, en-fr, en-de and en-es.
2 PAPERS • NO BENCHMARKS YET