Search Results for author: Isaac Caswell

Found 18 papers, 7 papers with code

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Writing System and Speaker Metadata for 2,800+ Language Varieties

1 code implementation LREC 2022 Daan van Esch, Tamar Lucassen, Sebastian Ruder, Isaac Caswell, Clara Rivera

We describe an open-source dataset providing metadata for about 2, 800 language varieties used in the world today.

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

1 code implementation27 Mar 2023 Alex Jones, Isaac Caswell, Ishank Saxena, Orhan Firat

Neural machine translation (NMT) has progressed rapidly over the past several years, and modern models are able to achieve relatively high quality using only monolingual text data, an approach dubbed Unsupervised Machine Translation (UNMT).

Data Augmentation NMT +2

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

1 code implementation COLING 2020 Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna

Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context.

Language Identification

BLEU might be Guilty but References are not Innocent

2 code implementations EMNLP 2020 Markus Freitag, David Grangier, Isaac Caswell

The quality of automatic metrics for machine translation has been increasingly called into question, especially for high-quality systems.

Machine Translation Translation

Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text

1 code implementation11 Nov 2023 Isaac Caswell, Lisa Wang, Isabel Papadimitriou

Data quality is a problem that perpetually resurfaces throughout the field of NLP, regardless of task, domain, or architecture, and remains especially severe for lower-resource languages.

Language Modelling

APE at Scale and its Implications on MT Evaluation Biases

no code implementations WS 2019 Markus Freitag, Isaac Caswell, Scott Roy

In this work, we train an Automatic Post-Editing (APE) model and use it to reveal biases in standard Machine Translation (MT) evaluation procedures.

Automatic Post-Editing NMT +1

Tagged Back-Translation

no code implementations WS 2019 Isaac Caswell, Ciprian Chelba, David Grangier

Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data.

Machine Translation NMT +1

Translationese as a Language in "Multilingual" NMT

no code implementations10 Nov 2019 Parker Riley, Isaac Caswell, Markus Freitag, David Grangier

Machine translation has an undesirable propensity to produce "translationese" artifacts, which can lead to higher BLEU scores while being liked less by human raters.

Machine Translation NMT +3

Translationese as a Language in ``Multilingual'' NMT

no code implementations ACL 2020 Parker Riley, Isaac Caswell, Markus Freitag, David Grangier

Machine translation has an undesirable propensity to produce {``}translationese{''} artifacts, which can lead to higher BLEU scores while being liked less by human raters.

Machine Translation NMT +3

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

no code implementations9 Jan 2022 Aditya Siddhant, Ankur Bapna, Orhan Firat, Yuan Cao, Mia Xu Chen, Isaac Caswell, Xavier Garcia

While recent progress in massively multilingual MT is one step closer to reaching this goal, it is becoming evident that extending a multilingual MT system simply by training on more parallel data is unscalable, since the availability of labeled data for low-resource and non-English-centric language pairs is forbiddingly limited.

Machine Translation Self-Supervised Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.