Search Results for author: Clara Rivera

Found 11 papers, 4 papers with code

Writing System and Speaker Metadata for 2,800+ Language Varieties

1 code implementation LREC 2022 Daan van Esch, Tamar Lucassen, Sebastian Ruder, Isaac Caswell, Clara Rivera

We describe an open-source dataset providing metadata for about 2, 800 language varieties used in the world today.

MD3: The Multi-Dialect Dataset of Dialogues

no code implementations19 May 2023 Jacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma

We introduce a new dataset of conversational speech representing English from India, Nigeria, and the United States.

TaTa: A Multilingual Table-to-Text Dataset for African Languages

1 code implementation31 Oct 2022 Sebastian Gehrmann, Sebastian Ruder, Vitaly Nikolaev, Jan A. Botha, Michael Chavinda, Ankur Parikh, Clara Rivera

To address this lack of data, we create Table-to-Text in African languages (TaTa), the first large multilingual table-to-text dataset with a focus on African languages.

Data-to-Text Generation

XTREME-S: Evaluating Cross-lingual Speech Representations

no code implementations21 Mar 2022 Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning.

Representation Learning Retrieval +4

Multimodal Pretraining for Dense Video Captioning

1 code implementation Asian Chapter of the Association for Computational Linguistics 2020 Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut

First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations.

 Ranked #1 on Dense Video Captioning on YouCook2 (ROUGE-L metric, using extra training data)

Dense Video Captioning

Open-source Multi-speaker Corpora of the English Accents in the British Isles

no code implementations LREC 2020 Isin Demirsahin, Oddur Kjartansson, Alex Gutkin, er, Clara Rivera

This paper presents a dataset of transcribed high-quality audio of English sentences recorded by volunteers speaking with different accents of the British Isles.

Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems

no code implementations LREC 2020 Fei He, Shan-Hui Cathy Chu, Oddur Kjartansson, Clara Rivera, Anna Katanova, Alex Gutkin, er, Isin Demirsahin, Cibu Johny, Martin Jansche, Supheakmungkol Sarin, Knot Pipatsrisawat

We present free high quality multi-speaker speech corpora for Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of India spoken by 374 million native speakers.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.