Search Results for author: Myle Ott

Found 42 papers, 22 papers with code

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

no code implementations • 21 Apr 2023 • Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li

It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains.

Paper
Add Code

OPT: Open Pre-trained Transformer Language Models

7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs

Decoder Hate Speech Detection +2

6,404

Paper
Code

Efficient Language Modeling with Sparse all-MLP

no code implementations • 14 Mar 2022 • Ping Yu, Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li

All-MLP architectures have attracted increasing interest as an alternative to attention-based models.

Ranked #17 on Question Answering on StoryCloze

Common Sense Reasoning In-Context Learning +4

Paper
Add Code

Few-shot Learning with Multilingual Language Models

2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

Large-scale generative language models such as GPT-3 are competitive few-shot learners.

Cross-Lingual Transfer Few-Shot Learning +5

29,427

Paper
Code

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Paper
Add Code

Sustainable AI: Environmental Implications, Challenges and Opportunities

no code implementations • 30 Oct 2021 • Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood

This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware.

Paper
Add Code

NormFormer: Improved Transformer Pretraining with Extra Normalization

1 code implementation • 18 Oct 2021 • Sam Shleifer, Jason Weston, Myle Ott

The extra operations incur negligible compute cost (+0. 4% parameter increase), but improve pretraining perplexity and downstream task performance for both causal and masked language models ranging from 125 Million to 2. 7 Billion parameters.

Language Modelling Masked Language Modeling

29,428

Paper
Code

On Anytime Learning at Macroscale

1 code implementation • 17 Jun 2021 • Lucas Caccia, Jing Xu, Myle Ott, Marc'Aurelio Ranzato, Ludovic Denoyer

Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time.

Language Modelling Learning Theory

Paper
Code

Larger-Scale Transformers for Multilingual Masked Language Modeling

no code implementations • ACL (RepL4NLP) 2021 • Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau

Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0. 3% on average while handling 99 more languages.

Masked Language Modeling XLM-R

Paper
Add Code

Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models

no code implementations • EACL 2021 • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.

Decoder Response Generation +2

Paper
Add Code

Few-shot Sequence Learning with Transformers

no code implementations • 17 Dec 2020 • Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.

Few-Shot Learning

Paper
Add Code

Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art

1 code implementation • 1 Nov 2020 • Patrick Lewis, Myle Ott, Jingfei Du, Veselin Stoyanov

A large array of pretrained models are available to the biomedical NLP (BioNLP) community.

151

Paper
Code

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

1 code implementation • Proceedings of the National Academy of Sciences 2020 • Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, Rob Fergus

In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation.

Language Modelling Representation Learning

2,889

Paper
Code

General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

no code implementations • Findings of the Association for Computational Linguistics 2020 • Jingfei Du, Myle Ott, Haoran Li, Xing Zhou, Veselin Stoyanov

The resulting method offers a compelling solution for using large-scale pre-trained models at a fraction of the computational cost when multiple tasks are performed on the same text.

Knowledge Distillation Quantization

Paper
Add Code

Recipes for building an open-domain chatbot

7 code implementations • EACL 2021 • Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston

Building open-domain chatbots is a challenging area for machine learning research.

Chatbot

126,503

Paper
Code

Residual Energy-Based Models for Text Generation

1 code implementation • ICLR 2020 • Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.

Language Modelling Machine Translation +2

Paper
Code

Residual Energy-Based Models for Text

no code implementations • 6 Apr 2020 • Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.

Paper
Add Code

How Decoding Strategies Affect the Verifiability of Generated Text

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Luca Massarelli, Fabio Petroni, Aleksandra Piktus, Myle Ott, Tim Rocktäschel, Vassilis Plachouras, Fabrizio Silvestri, Sebastian Riedel

A generated sentence is verifiable if it can be corroborated or disproved by Wikipedia, and we find that the verifiability of generated text strongly depends on the decoding strategy.

Language Modelling Natural Language Understanding +2

Paper
Code

Unsupervised Cross-lingual Representation Learning at Scale

28 code implementations • ACL 2020 • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.

Cross-Lingual Transfer Multilingual NLP +2

126,503

Paper
Code

The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali--English and Sinhala--English

1 code implementation • IJCNLP 2019 • Francisco Guzm{\'a}n, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc{'}Aurelio Ranzato

For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.

Machine Translation Translation

659

Paper
Code

Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models

no code implementations • 16 Oct 2019 • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.

Decoder Response Generation +2

Paper
Add Code

Facebook AI's WAT19 Myanmar-English Translation Task Submission

no code implementations • WS 2019 • Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc'Aurelio Ranzato

This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task.

Re-Ranking Translation

Paper
Add Code

The Source-Target Domain Mismatch Problem in Machine Translation

no code implementations • EACL 2021 • Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in.

Machine Translation Translation

Paper
Add Code

On The Evaluation of Machine Translation Systems Trained With Back-Translation

1 code implementation • ACL 2020 • Sergey Edunov, Myle Ott, Marc'Aurelio Ranzato, Michael Auli

Back-translation is a widely used data augmentation technique which leverages target monolingual data.

Data Augmentation Language Modelling +2

Paper
Code

RoBERTa: A Robustly Optimized BERT Pretraining Approach

59 code implementations • 26 Jul 2019 • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Ranked #1 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (Wasserstein Distance (WD) metric, using extra training data)

Document Image Classification Language Modelling +13

126,503

Paper
Code

Facebook FAIR's WMT19 News Translation Task Submission

5 code implementations • WS 2019 • Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov

This paper describes Facebook FAIR's submission to the WMT19 shared news translation task.

Ranked #1 on Machine Translation on WMT2019 English-German

Machine Translation Translation

126,503

Paper
Code

Real or Fake? Learning to Discriminate Machine from Human Generated Text

no code implementations • 7 Jun 2019 • Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam

Energy-based models (EBMs), a. k. a.

Language Modelling Text Generation

Paper
Add Code

Diverse Machine Translation with a Single Multinomial Latent Variable

no code implementations • ICLR 2019 • Tianxiao Shen, Myle Ott, Michael Auli, Marc’Aurelio Ranzato

There are many ways to translate a sentence into another language.

Machine Translation Sentence +1

Paper
Add Code

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

6 code implementations • NAACL 2019 • Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.

Language Modelling Text Generation +1

29,428

Paper
Code

Mixture Models for Diverse Machine Translation: Tricks of the Trade

1 code implementation • 20 Feb 2019 • Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Machine Translation Text Generation +1

29,428

Paper
Code

The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

2 code implementations • 4 Feb 2019 • Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.

Machine Translation Translation

659

Paper
Code

Phrase-Based \& Neural Unsupervised Machine Translation

no code implementations • EMNLP 2018 • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc{'}Aurelio Ranzato

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs.

Denoising NMT +3

Paper
Add Code

Understanding Back-Translation at Scale

3 code implementations • EMNLP 2018 • Sergey Edunov, Myle Ott, Michael Auli, David Grangier

An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences.

Ranked #2 on Machine Translation on WMT2014 English-German (using extra training data)

Machine Translation Translation

29,428

Paper
Code

Scaling Neural Machine Translation

5 code implementations • WS 2018 • Myle Ott, Sergey Edunov, David Grangier, Michael Auli

Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine.

Ranked #12 on Machine Translation on WMT2014 English-French

Machine Translation Question Answering +1

29,428

Paper
Code

Phrase-Based & Neural Unsupervised Machine Translation

15 code implementations • EMNLP 2018 • Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

Ranked #2 on Machine Translation on WMT2016 English-Russian

NMT Sentence +2

126,503

Paper
Code

Analyzing Uncertainty in Neural Machine Translation

1 code implementation • ICML 2018 • Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations.

Machine Translation Sentence +2

Paper
Code

Classical Structured Prediction Losses for Sequence to Sequence Learning

1 code implementation • NAACL 2018 • Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.

Ranked #4 on Machine Translation on IWSLT2015 German-English