Search Results for author: Orhan Firat

Found 55 papers, 13 papers with code

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation

2 code implementations ICML 2020 Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing.

Zero-Shot Cross-Lingual Transfer

A Loss Curvature Perspective on Training Instability in Deep Learning

no code implementations8 Oct 2021 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference

no code implementations Findings (EMNLP) 2021 Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Minh-Thang Luong, Orhan Firat

On WMT, our task-MoE with 32 experts (533M parameters) outperforms the best performing token-level MoE model (token-MoE) by +1. 0 BLEU on average across 30 language pairs.

Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents

no code implementations21 Sep 2021 Biao Zhang, Ankur Bapna, Melvin Johnson, Ali Dabirmoghaddam, Naveen Arivazhagan, Orhan Firat

Using simple concatenation-based DocNMT, we explore the effect of 3 factors on multilingual transfer: the number of document-supervised teacher languages, the data schedule for parallel documents at training, and the data condition of parallel documents (genuine vs. backtranslated).

Document-level Machine Translation +1

Towards Zero-Label Language Learning

no code implementations19 Sep 2021 ZiRui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao

This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data.

Data Augmentation

Evaluating Multiway Multilingual NMT in the Turkic Languages

no code implementations13 Sep 2021 Jamshidbek Mirzakhalov, Anoop Babu, Aigiz Kunafin, Ahsan Wahab, Behzod Moydinboyev, Sardana Ivanova, Mokhiyakhon Uzokova, Shaxnoza Pulatova, Duygu Ataman, Julia Kreutzer, Francis Tyers, Orhan Firat, John Licato, Sriram Chellappan

Then, we train 26 bilingual baselines as well as a multi-way neural MT (MNMT) model using the corpus and perform an extensive analysis using automatic metrics as well as human evaluations.

Machine Translation

Towards Universality in Multilingual Text Rewriting

no code implementations30 Jul 2021 Xavier Garcia, Noah Constant, Mandy Guo, Orhan Firat

In this work, we take the first steps towards building a universal rewriter: a model capable of rewriting text in any language to exhibit a wide variety of attributes, including styles and languages, while preserving as much of the original semantics as possible.

Translation

Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution

no code implementations NAACL 2021 Xavier Garcia, Noah Constant, Ankur P. Parikh, Orhan Firat

We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models, paving the way towards efficient continual learning for multilingual machine translation.

Continual Learning Machine Translation +1

Gradient-guided Loss Masking for Neural Machine Translation

no code implementations26 Feb 2021 Xinyi Wang, Ankur Bapna, Melvin Johnson, Orhan Firat

To mitigate the negative effect of low quality training data on the performance of neural machine translation models, most existing strategies focus on filtering out harmful data before training starts.

Machine Translation Translation

Exploring Routing Strategies for Multilingual Mixture-of-Experts Models

no code implementations1 Jan 2021 Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Thang Luong, Orhan Firat

Sparsely-Gated Mixture-of-Experts (MoE) has been a successful approach for scaling multilingual translation models to billions of parameters without a proportional increase in training computation.

Rapid Domain Adaptation for Machine Translation with Monolingual Data

no code implementations23 Oct 2020 Mahdis Mahdieh, Mia Xu Chen, Yuan Cao, Orhan Firat

In this paper, we propose an approach that enables rapid domain adaptation from the perspective of unsupervised translation.

Domain Adaptation Machine Translation +1

Towards End-to-End In-Image Neural Machine Translation

no code implementations EMNLP (nlpbt) 2020 Elman Mansimov, Mitchell Stern, Mia Chen, Orhan Firat, Jakob Uszkoreit, Puneet Jain

In this paper, we offer a preliminary investigation into the task of in-image machine translation: transforming an image containing text in one language into an image containing the same text in another language.

Machine Translation Translation

Complete Multilingual Neural Machine Translation

no code implementations WMT (EMNLP) 2020 Markus Freitag, Orhan Firat

We reintroduce this direct parallel data from multi-way aligned corpora between all source and target languages.

Machine Translation Transfer Learning +1

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

no code implementations NAACL 2021 Junjie Hu, Melvin Johnson, Orhan Firat, Aditya Siddhant, Graham Neubig

Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) and XLMR (Conneau et al., 2020) have proven to be impressively effective at enabling transfer-learning of NLP systems from high-resource languages to low-resource languages.

Sentence Classification Transfer Learning +1

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

2 code implementations ICLR 2021 Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute.

Machine Translation Translation

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

2 code implementations24 Mar 2020 Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing.

Cross-Lingual Transfer

Controlling Computation versus Quality for Neural Sequence Models

no code implementations17 Feb 2020 Ankur Bapna, Naveen Arivazhagan, Orhan Firat

Further, methods that adapt the amount of computation to the example focus on finding a fixed inference-time computational graph per example, ignoring any external computational budgets or varying inference time limitations.

Unsupervised Representation Learning

On the Discrepancy between Density Estimation and Sequence Generation

1 code implementation EMNLP (spnlp) 2020 Jason Lee, Dustin Tran, Orhan Firat, Kyunghyun Cho

In this paper, by comparing several density estimators on five machine translation tasks, we find that the correlation between rankings of models based on log-likelihood and BLEU varies significantly depending on the range of the model families being compared.

Density Estimation Latent Variable Models +3

Fill in the Blanks: Imputing Missing Sentences for Larger-Context Neural Machine Translation

no code implementations30 Oct 2019 Sébastien Jean, Ankur Bapna, Orhan Firat

In particular, we consider three distinct approaches to generate the missing context: using random contexts, applying a copy heuristic or generating it with a language model.

Document-level Document Level Machine Translation +3

On the Importance of Word Boundaries in Character-level Neural Machine Translation

1 code implementation WS 2019 Duygu Ataman, Orhan Firat, Mattia A. Di Gangi, Marcello Federico, Alexandra Birch

Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality.

Machine Translation Translation

Adaptive Scheduling for Multi-Task Learning

no code implementations13 Sep 2019 Sébastien Jean, Orhan Firat, Melvin Johnson

To train neural machine translation models simultaneously on multiple tasks (languages), it is common to sample each task uniformly or in proportion to dataset sizes.

Machine Translation Multi-Task Learning +1

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

no code implementations1 Sep 2019 Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, Orhan Firat, Karthik Raman

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model.

Cross-Lingual Transfer Machine Translation +2

The Missing Ingredient in Zero-Shot Neural Machine Translation

no code implementations17 Mar 2019 Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey

Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages.

Machine Translation Translation

Non-Parametric Adaptation for Neural Machine Translation

no code implementations NAACL 2019 Ankur Bapna, Orhan Firat

Neural Networks trained with gradient descent are known to be susceptible to catastrophic forgetting caused by parameter shift during the training process.

Domain Adaptation Machine Translation +2

Massively Multilingual Neural Machine Translation

no code implementations NAACL 2019 Roee Aharoni, Melvin Johnson, Orhan Firat

Our experiments on a large-scale dataset with 102 languages to and from English and up to one million examples per direction also show promising results, surpassing strong bilingual baselines and encouraging future work on massively multilingual NMT.

Machine Translation Translation

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

3 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Hallucinations in Neural Machine Translation

no code implementations27 Sep 2018 Katherine Lee, Orhan Firat, Ashish Agarwal, Clara Fannjiang, David Sussillo

Neural machine translation (NMT) systems have reached state of the art performance in translating text and are in wide deployment.

Data Augmentation Machine Translation +1

Zero-Shot Cross-lingual Classification Using Multilingual Neural Machine Translation

no code implementations12 Sep 2018 Akiko Eriguchi, Melvin Johnson, Orhan Firat, Hideto Kazawa, Wolfgang Macherey

However, little attention has been paid to leveraging representations learned by a multilingual NMT system to enable zero-shot multilinguality in other NLP tasks.

Classification Cross-Lingual Transfer +5

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

no code implementations EMNLP 2018 Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, Wolfgang Macherey

Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering.

Feature Engineering Machine Translation +1

Training Deeper Neural Machine Translation Models with Transparent Attention

no code implementations EMNLP 2018 Ankur Bapna, Mia Xu Chen, Orhan Firat, Yuan Cao, Yonghui Wu

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications.

Machine Translation Translation

Does Neural Machine Translation Benefit from Larger Context?

no code implementations17 Apr 2017 Sebastien Jean, Stanislas Lauly, Orhan Firat, Kyunghyun Cho

We propose a neural machine translation architecture that models the surrounding text in addition to the source sentence.

Machine Translation Translation

Zero-Resource Translation with Multi-Lingual Neural Machine Translation

no code implementations EMNLP 2016 Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, Kyunghyun Cho

In this paper, we propose a novel finetuning algorithm for the recently introduced multi-way, mulitlingual neural machine translate that enables zero-resource machine translation.

Machine Translation Translation

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation9 May 2016 The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

Dimensionality Reduction General Classification

On Using Monolingual Corpora in Neural Machine Translation

no code implementations11 Mar 2015 Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.

Machine Translation Translation

Learning Deep Temporal Representations for Brain Decoding

no code implementations23 Dec 2014 Orhan Firat, Emre Aksan, Ilke Oztekin, Fatos T. Yarman Vural

By employing the proposed temporal convolutional architecture with spatial pooling, raw input fMRI data is mapped to a non-linear, highly-expressive and low-dimensional feature space where the final classification is conducted.

Brain Decoding General Classification

Discriminative Functional Connectivity Measures for Brain Decoding

no code implementations23 Feb 2014 Orhan Firat, Mete Ozay, Ilke Oztekin, Fatos T. Yarman Vural

The proposed method was tested on a recognition memory experiment, including data pertaining to encoding and retrieval of words belonging to ten different semantic categories.

Brain Decoding Time Series

Cannot find the paper you are looking for? You can Submit a new open access paper.