Search Results for author: Ankur Bapna

Found 29 papers, 7 papers with code

Joint Unsupervised and Supervised Training for Multilingual ASR

no code implementations15 Nov 2021 Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

Our average WER of all languages outperforms average monolingual baseline by 33. 3%, and the state-of-the-art 2-stage XLSR by 32%.

Language Modelling Speech Recognition +1

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference

no code implementations Findings (EMNLP) 2021 Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Minh-Thang Luong, Orhan Firat

On WMT, our task-MoE with 32 experts (533M parameters) outperforms the best performing token-level MoE model (token-MoE) by +1. 0 BLEU on average across 30 language pairs.

Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents

no code implementations21 Sep 2021 Biao Zhang, Ankur Bapna, Melvin Johnson, Ali Dabirmoghaddam, Naveen Arivazhagan, Orhan Firat

Using simple concatenation-based DocNMT, we explore the effect of 3 factors on multilingual transfer: the number of document-supervised teacher languages, the data schedule for parallel documents at training, and the data condition of parallel documents (genuine vs. backtranslated).

Document-level Machine Translation +1

Gradient-guided Loss Masking for Neural Machine Translation

no code implementations26 Feb 2021 Xinyi Wang, Ankur Bapna, Melvin Johnson, Orhan Firat

To mitigate the negative effect of low quality training data on the performance of neural machine translation models, most existing strategies focus on filtering out harmful data before training starts.

Machine Translation Translation

Exploring Routing Strategies for Multilingual Mixture-of-Experts Models

no code implementations1 Jan 2021 Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Thang Luong, Orhan Firat

Sparsely-Gated Mixture-of-Experts (MoE) has been a successful approach for scaling multilingual translation models to billions of parameters without a proportional increase in training computation.

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

1 code implementation COLING 2020 Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna

Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context.

Language Identification

Controlling Computation versus Quality for Neural Sequence Models

no code implementations17 Feb 2020 Ankur Bapna, Naveen Arivazhagan, Orhan Firat

Further, methods that adapt the amount of computation to the example focus on finding a fixed inference-time computational graph per example, ignoring any external computational budgets or varying inference time limitations.

Unsupervised Representation Learning

Faster Transformer Decoding: N-gram Masked Self-Attention

no code implementations14 Jan 2020 Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer

Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence $S=s_1, \ldots, s_S$, we propose truncating the target-side window used for computing self-attention by making an $N$-gram assumption.

Fill in the Blanks: Imputing Missing Sentences for Larger-Context Neural Machine Translation

no code implementations30 Oct 2019 Sébastien Jean, Ankur Bapna, Orhan Firat

In particular, we consider three distinct approaches to generate the missing context: using random contexts, applying a copy heuristic or generating it with a language model.

Document-level Document Level Machine Translation +3

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

no code implementations1 Sep 2019 Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model.

Cross-Lingual Transfer Machine Translation +2

The Missing Ingredient in Zero-Shot Neural Machine Translation

no code implementations17 Mar 2019 Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey

Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages.

Machine Translation Translation

Non-Parametric Adaptation for Neural Machine Translation

no code implementations NAACL 2019 Ankur Bapna, Orhan Firat

Neural Networks trained with gradient descent are known to be susceptible to catastrophic forgetting caused by parameter shift during the training process.

Domain Adaptation Machine Translation +2

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

3 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

no code implementations EMNLP 2018 Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, Wolfgang Macherey

Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering.

Feature Engineering Machine Translation +1

Training Deeper Neural Machine Translation Models with Transparent Attention

no code implementations EMNLP 2018 Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, Yonghui Wu

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications.

Machine Translation Translation

Building a Conversational Agent Overnight with Dialogue Self-Play

3 code implementations15 Jan 2018 Pararth Shah, Dilek Hakkani-Tür, Gokhan Tür, Abhinav Rastogi, Ankur Bapna, Neha Nayak, Larry Heck

We propose Machines Talking To Machines (M2M), a framework combining automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents for goal-oriented dialogues in arbitrary domains.

Towards Zero-Shot Frame Semantic Parsing for Domain Scaling

1 code implementation7 Jul 2017 Ankur Bapna, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck

While multi-task training of such models alleviates the need for large in-domain annotated datasets, bootstrapping a semantic parsing model for a new domain using only the semantic frame, such as the back-end API or knowledge graph schema, is still one of the holy grail tasks of language understanding for dialogue systems.

Language understanding Semantic Parsing +1

Sequential Dialogue Context Modeling for Spoken Language Understanding

1 code implementation WS 2017 Ankur Bapna, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck

We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history.

Goal-Oriented Dialogue Systems Language understanding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.