Search Results for author: Naman Goyal

Found 41 papers, 21 papers with code

Multilingual Translation from Denoising Pre-Training

no code implementations • Findings (ACL) 2021 • Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan

Denoising Translation

Paper
Add Code

Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

no code implementations • WMT (EMNLP) 2021 • Guillaume Wenzek, Vishrav Chaudhary, Angela Fan, Sahir Gomez, Naman Goyal, Somya Jain, Douwe Kiela, Tristan Thrush, Francisco Guzmán

There were a total of 10 participating teams for the tasks, with a total of 151 intermediate model submissions and 13 final models.

Machine Translation Translation

Paper
Add Code

Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

no code implementations • WMT (EMNLP) 2020 • Philipp Koehn, Vishrav Chaudhary, Ahmed El-Kishky, Naman Goyal, Peng-Jen Chen, Francisco Guzmán

Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems.

Sentence Translation

Paper
Add Code

Facebook AI’s WMT20 News Translation Task Submission

no code implementations • WMT (EMNLP) 2020 • Peng-Jen Chen, Ann Lee, Changhan Wang, Naman Goyal, Angela Fan, Mary Williamson, Jiatao Gu

We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain.

Data Augmentation Translation

Paper
Add Code

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

1 code implementation • 31 Aug 2023 • Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa

We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs).

Cross-Lingual Transfer Machine Reading Comprehension +2

304

Paper
Code

Llama 2: Open Foundation and Fine-Tuned Chat Models

14 code implementations • 18 Jul 2023 • Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Ranked #2 on Question Answering on PubChemQA

Arithmetic Reasoning +5

53,354

Paper
Code

A Theory on Adam Instability in Large-Scale Machine Learning

no code implementations • 19 Apr 2023 • Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

We present a theory for the previously unexplained divergent behavior noticed in the training of large language models.

Language Modelling

Paper
Add Code

LLaMA: Open and Efficient Foundation Language Models

44 code implementations • arXiv 2023 • Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.

Ranked #3 on Question Answering on OBQA

Arithmetic Reasoning Code Generation +6

125,725

Paper
Code

Text-To-4D Dynamic Scene Generation

no code implementations • 26 Jan 2023 • Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.

Scene Generation

Paper
Add Code

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

2 code implementations • 25 Jan 2023 • Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa

Large multilingual language models typically rely on a single vocabulary shared across 100+ languages.

named-entity-recognition Named Entity Recognition +4

Paper
Code

Scaling Laws for Generative Mixed-Modal Language Models

no code implementations • 10 Jan 2023 • Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens.

Paper
Add Code

A survey on Self Supervised learning approaches for improving Multimodal representation learning

no code implementations • 20 Oct 2022 • Naman Goyal

Recently self supervised learning has seen explosive growth and use in variety of machine learning tasks because of its ability to avoid the cost of annotating large-scale datasets.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

2 code implementations • 5 Aug 2022 • Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks.

Continual Learning

10,431

Paper
Code

On the Role of Bidirectionality in Language Model Pre-Training

no code implementations • 24 May 2022 • Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Ves Stoyanov

Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult.

Language Modelling Text Infilling

Paper
Add Code

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

no code implementations • NAACL 2022 • Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe

Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

OPT: Open Pre-trained Transformer Language Models

7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs

Decoder Hate Speech Detection +2

6,388

Paper
Code

How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

no code implementations • AMTA 2022 • Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman

Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus.

Machine Translation Translation

Paper
Add Code

Graph Neural Networks for Image Classification and Reinforcement Learning using Graph representations

no code implementations • 7 Mar 2022 • Naman Goyal, David Steiner

In this paper, we will evaluate the performance of graph neural networks in two distinct domains: computer vision and reinforcement learning.

Image Classification Inductive Bias +3

Paper
Add Code

CM3: A Causal Masked Multimodal Model of the Internet

no code implementations • 19 Jan 2022 • Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens.

Entity Disambiguation Entity Linking

Paper
Add Code

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

Paper
Add Code

Few-shot Learning with Multilingual Language Models

2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

Large-scale generative language models such as GPT-3 are competitive few-shot learners.

Cross-Lingual Transfer Few-Shot Learning +5

29,340

Paper
Code

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

2 code implementations • 17 Nov 2021 • Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.

Ranked #1 on Language Identification on VoxLingua107 (using extra training data)

Language Identification Representation Learning +3

29,334

Paper
Code

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

no code implementations • ACL (IWSLT) 2021 • Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal

In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task.

Transfer Learning Translation

Paper
Add Code

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

2 code implementations • 6 Jun 2021 • Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman, Angela Fan

One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks.

Machine Translation Translation

660

Paper
Code

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.

Denoising Machine Translation +2

Paper
Code

Larger-Scale Transformers for Multilingual Masked Language Modeling

no code implementations • ACL (RepL4NLP) 2021 • Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau

Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0. 3% on average while handling 99 more languages.

Masked Language Modeling XLM-R

Paper
Add Code

BASE Layers: Simplifying Training of Large, Sparse Models

1 code implementation • 30 Mar 2021 • Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal, Luke Zettlemoyer

Sparse layers can dramatically improve the efficiency of training and inference by routing each token to specialized expert modules that contain only a small fraction of the model parameters.

29,334

Paper
Code

Multilingual Autoregressive Entity Linking

1 code implementation • 23 Mar 2021 • Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time.

Ranked #2 on Entity Disambiguation on Mewsli-9 (using extra training data)

Entity Disambiguation Entity Linking

739

Paper
Code

Facebook AI's WMT20 News Translation Task Submission

no code implementations • 16 Nov 2020 • Peng-Jen Chen, Ann Lee, Changhan Wang, Naman Goyal, Angela Fan, Mary Williamson, Jiatao Gu

We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain.

Data Augmentation Translation

Paper
Add Code

Beyond English-Centric Multilingual Machine Translation

7 code implementations • 21 Oct 2020 • Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation Translation

125,725

Paper
Code

Better Fine-Tuning by Reducing Representational Collapse

3 code implementations • ICLR 2021 • Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods.

Ranked #1 on Cross-Lingual Natural Language Inference on XNLI Zero-Shot English-to-Spanish

Abstractive Text Summarization Cross-Lingual Natural Language Inference

29,334

Paper
Code

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

5 code implementations • 2 Aug 2020 • Yuqing Tang, Chau Tran, Xi-An Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan

Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages.

Machine Translation Translation

125,725

Paper
Code

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

9 code implementations • NeurIPS 2020 • Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks.

Ranked #4 on Question Answering on WebQuestions

Fact Verification Question Answering +3

125,725

Paper
Code

Recipes for building an open-domain chatbot

7 code implementations • EACL 2021 • Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston

Building open-domain chatbots is a challenging area for machine learning research.

Chatbot

125,725

Paper
Code

Multilingual Denoising Pre-training for Neural Machine Translation

5 code implementations • 22 Jan 2020 • Yinhan Liu, Jiatao Gu, Naman Goyal, Xi-An Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks.

Decoder Denoising +3

125,725

Paper
Code

Unsupervised Cross-lingual Representation Learning at Scale

27 code implementations • ACL 2020 • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.

Cross-Lingual Transfer Multilingual NLP +2

125,725

Paper
Code

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

43 code implementations • ACL 2020 • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel-rahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.

Ranked #3 on Open-Domain Question Answering on ELI5

Abstractive Text Summarization Decoder +6

125,725

Paper
Code

RoBERTa: A Robustly Optimized BERT Pretraining Approach

59 code implementations • 26 Jul 2019 • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Ranked #1 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (Wasserstein Distance (WD) metric, using extra training data)

Document Image Classification Language Modelling +13

125,725

Paper
Code

LearningToQuestion at SemEval 2017 Task 3: Ranking Similar Questions by Learning to Rank Using Rich Features

no code implementations • SEMEVAL 2017 • Naman Goyal

This paper describes our official entry LearningToQuestion for SemEval 2017 task 3 community question answer, subtask B.

Information Retrieval Learning-To-Rank +2

Paper
Add Code

A Joint Model of Rhetorical Discourse Structure and Summarization

no code implementations • WS 2016 • Naman Goyal, Jacob Eisenstein

Structured Prediction

Paper
Add Code

The Social Dynamics of Language Change in Online Networks

no code implementations • 7 Sep 2016 • Rahul Goel, Sandeep Soni, Naman Goyal, John Paparrizos, Hanna Wallach, Fernando Diaz, Jacob Eisenstein

Language change is a complex social phenomenon, revealing pathways of communication and sociocultural influence.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.