Search Results for author: Daan van Esch

Found 16 papers, 2 papers with code

Writing System and Speaker Metadata for 2,800+ Language Varieties

1 code implementation LREC 2022 Daan van Esch, Tamar Lucassen, Sebastian Ruder, Isaac Caswell, Clara Rivera

We describe an open-source dataset providing metadata for about 2, 800 language varieties used in the world today.

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

no code implementations5 Aug 2022 Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

Almost none of the 2, 000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

no code implementations16 May 2022 Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang

However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition.

Accented Speech Recognition Benchmarking +1

XTREME-S: Evaluating Cross-lingual Speech Representations

no code implementations21 Mar 2022 Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning.

Representation Learning Retrieval +4

Handling Compounding in Mobile Keyboard Input

no code implementations17 Jan 2022 Andreas Kabel, Keith Hall, Tom Ouyang, David Rybach, Daan van Esch, Françoise Beaufays

This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages.

Mining Large-Scale Low-Resource Pronunciation Data From Wikipedia

no code implementations27 Jan 2021 Tania Chakraborty, Manasa Prasad, Theresa Breiner, Sandy Ritchie, Daan van Esch

Pronunciation modeling is a key task for building speech technology in new languages, and while solid grapheme-to-phoneme (G2P) mapping systems exist, language coverage can stand to be improved.

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

1 code implementation COLING 2020 Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna

Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context.

Language Identification

Writing Across the World's Languages: Deep Internationalization for Gboard, the Google Keyboard

no code implementations3 Dec 2019 Daan van Esch, Elnaz Sarbar, Tamar Lucassen, Jeremy O'Brien, Theresa Breiner, Manasa Prasad, Evan Crew, Chieu Nguyen, Françoise Beaufays

Today, Gboard supports 900+ language varieties across 70+ writing systems, and this report describes how and why we have been adding support for hundreds of language varieties from around the globe.

Automatic Keyboard Layout Design for Low-Resource Latin-Script Languages

no code implementations18 Jan 2019 Theresa Breiner, Chieu Nguyen, Daan van Esch, Jeremy O'Brien

For many speakers, one of the barriers in accessing and creating text content on the web is the absence of input tools for their language.

Layout Design

Cannot find the paper you are looking for? You can Submit a new open access paper.