Search Results for author: Jason Riesa

Found 16 papers, 1 papers with code

SQuId: Measuring Speech Naturalness in Many Languages

no code implementations12 Oct 2022 Thibault Sellam, Ankur Bapna, Joshua Camp, Diana Mackinnon, Ankur P. Parikh, Jason Riesa

The main insight is that training one model on many locales consistently outperforms mono-locale baselines.

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

no code implementations1 Oct 2022 Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan Garrette, Jason Riesa, Orhan Firat, Noah Constant

We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation.

Machine Translation Translation

XTREME-S: Evaluating Cross-lingual Speech Representations

no code implementations21 Mar 2022 Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning.

Representation Learning Retrieval +4

mSLAM: Massively multilingual joint pre-training for speech and text

no code implementations3 Feb 2022 Ankur Bapna, Colin Cherry, Yu Zhang, Ye Jia, Melvin Johnson, Yong Cheng, Simran Khanuja, Jason Riesa, Alexis Conneau

We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual cross-modal representations of speech and text by pre-training jointly on large amounts of unlabeled speech and text in multiple languages.

 Ranked #1 on Spoken language identification on FLEURS (using extra training data)

intent-classification Intent Classification +4

Improving Multilingual Models with Language-Clustered Vocabularies

no code implementations EMNLP 2020 Hyung Won Chung, Dan Garrette, Kiat Chuan Tan, Jason Riesa

State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications.

NER

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

no code implementations1 Sep 2019 Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model.

Cross-Lingual Transfer Machine Translation +3

Small and Practical BERT Models for Sequence Labeling

no code implementations IJCNLP 2019 Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer

We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU.

Part-Of-Speech Tagging

A Fast, Compact, Accurate Model for Language Identification of Codemixed Text

no code implementations EMNLP 2018 Yuan Zhang, Jason Riesa, Daniel Gillick, Anton Bakalov, Jason Baldridge, David Weiss

We address fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages.

Language Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.