no code implementations • 24 May 2022 • Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Ves Stoyanov
Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult.
no code implementations • 3 May 2022 • Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov, Zornitsa Kozareva
Self-supervised pretraining has made few-shot learning possible for many NLP tasks.
1 code implementation • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li
In this work, we train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks.
no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation.
no code implementations • ACL (RepL4NLP) 2021 • Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau
Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0. 3% on average while handling 99 more languages.
1 code implementation • ICLR 2021 • Beliz Gunel, Jingfei Du, Alexis Conneau, Ves Stoyanov
Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data, and can generalize better to related tasks with limited labeled data.
1 code implementation • 1 Nov 2020 • Patrick Lewis, Myle Ott, Jingfei Du, Veselin Stoyanov
A large array of pretrained models are available to the biomedical NLP (BioNLP) community.
1 code implementation • NAACL 2021 • Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur Celebi, Michael Auli, Ves Stoyanov, Alexis Conneau
Unsupervised pre-training has led to much recent progress in natural language understanding.
1 code implementation • ICLR 2021 • Wenhan Xiong, Xiang Lorraine Li, Srini Iyer, Jingfei Du, Patrick Lewis, William Yang Wang, Yashar Mehdad, Wen-tau Yih, Sebastian Riedel, Douwe Kiela, Barlas Oğuz
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER.
Ranked #14 on
Question Answering
on HotpotQA
no code implementations • Findings of the Association for Computational Linguistics 2020 • Jingfei Du, Myle Ott, Haoran Li, Xing Zhou, Veselin Stoyanov
The resulting method offers a compelling solution for using large-scale pre-trained models at a fraction of the computational cost when multiple tasks are performed on the same text.
no code implementations • ICLR 2020 • Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov
Models trained with our new objective yield significant improvements on the fact completion task.
48 code implementations • 26 Jul 2019 • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
Ranked #2 on
Common Sense Reasoning
on SWAG
1 code implementation • NAACL 2019 • Angli Liu, Jingfei Du, Veselin Stoyanov
Our Knowledge-Augmented Language Model (KALM) continues this line of work by augmenting a traditional model with a KB.