Search Results for author: Theresa Breiner

Found 7 papers, 1 papers with code

UserLibri: A Dataset for ASR Personalization Using Only Text

no code implementations • 2 Jul 2022 • Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg.

Language Modelling speech-recognition +1

Paper
Add Code

Building Machine Translation Systems for the Next Thousand Languages

no code implementations • 9 May 2022 • Ankur Bapna, Isaac Caswell, Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, Macduff Hughes

In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages.

Language Identification Machine Translation +1

Paper
Add Code

Scaling Language Model Size in Cross-Device Federated Learning

no code implementations • FL4NLP (ACL) 2022 • Jae Hun Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Theertha Suresh, Shankar Kumar, Rajiv Mathews

Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks.

Federated Learning Language Modelling +2

Paper
Add Code

Mining Large-Scale Low-Resource Pronunciation Data From Wikipedia

no code implementations • 27 Jan 2021 • Tania Chakraborty, Manasa Prasad, Theresa Breiner, Sandy Ritchie, Daan van Esch

Pronunciation modeling is a key task for building speech technology in new languages, and while solid grapheme-to-phoneme (G2P) mapping systems exist, language coverage can stand to be improved.

Paper
Add Code

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

1 code implementation • COLING 2020 • Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna

Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context.

Language Identification

Paper
Code

Writing Across the World's Languages: Deep Internationalization for Gboard, the Google Keyboard

no code implementations • 3 Dec 2019 • Daan van Esch, Elnaz Sarbar, Tamar Lucassen, Jeremy O'Brien, Theresa Breiner, Manasa Prasad, Evan Crew, Chieu Nguyen, Françoise Beaufays

Today, Gboard supports 900+ language varieties across 70+ writing systems, and this report describes how and why we have been adding support for hundreds of language varieties from around the globe.

Paper
Add Code

Automatic Keyboard Layout Design for Low-Resource Latin-Script Languages

no code implementations • 18 Jan 2019 • Theresa Breiner, Chieu Nguyen, Daan van Esch, Jeremy O'Brien

For many speakers, one of the barriers in accessing and creating text content on the web is the absence of input tools for their language.

Layout Design

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.