no code implementations • 16 Aug 2023 • JianGuo Zhang, Stephen Roller, Kun Qian, Zhiwei Liu, Rui Meng, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong
End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models.
no code implementations • 26 Jul 2023 • Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston
We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations.
no code implementations • 19 Apr 2023 • Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang
We present a theory for the previously unexplained divergent behavior noticed in the training of large language models.
no code implementations • 10 Jan 2023 • Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer
To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens.
1 code implementation • Science 2022 • Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sash Mitts, Aditya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra
Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.
5 code implementations • 5 Aug 2022 • Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston
We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks.
11 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Ranked #2 on
Stereotypical Bias Analysis
on CrowS-Pairs
1 code implementation • 24 Mar 2022 • Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston
We show that, when using SeeKeR as a dialogue model, it outperforms the state-of-the-art model BlenderBot 2 (Chen et al., 2021) on open-domain knowledge-grounded conversations for the same number of parameters, in terms of consistency, knowledge and per-turn engagingness.
no code implementations • NLP4ConvAI (ACL) 2022 • Eric Michael Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, Jason Weston
At the heart of improving conversational AI is the open problem of how to evaluate conversations.
no code implementations • 13 Oct 2021 • Moya Chen, Paul A. Crook, Stephen Roller
We demonstrate that large language models are able to simulate Task Oriented Dialogues in novel domains, provided only with an API implementation and a list of goals.
1 code implementation • 8 Jun 2021 • Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston
Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture.
no code implementations • NeurIPS 2021 • Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston
We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.
1 code implementation • 13 May 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.
Ranked #4 on
Language Modelling
on enwik8
1 code implementation • 1 Jan 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.
1 code implementation • NAACL 2021 • Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie
Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e. g., booking hotels), open-domain chatbots aim at making socially engaging conversations.
no code implementations • 22 Jun 2020 • Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson
We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet.
8 code implementations • EACL 2021 • Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston
Building open-domain chatbots is a challenging area for machine learning research.
1 code implementation • ACL 2020 • Margaret Li, Stephen Roller, Ilia Kulikov, Sean Welleck, Y-Lan Boureau, Kyunghyun Cho, Jason Weston
Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address.
no code implementations • ACL 2020 • Kurt Shuster, Da Ju, Stephen Roller, Emily Dinan, Y-Lan Boureau, Jason Weston
We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images.
no code implementations • 6 Sep 2019 • Margaret Li, Jason Weston, Stephen Roller
While dialogue remains an important end-goal of natural language research, the difficulty of evaluation is an oft-quoted reason why it remains troublesome to make real progress towards its solution.
6 code implementations • ICLR 2020 • Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston
Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core.
2 code implementations • NAACL 2019 • Abigail See, Stephen Roller, Douwe Kiela, Jason Weston
A good conversation requires balance -- between simplicity and detail; staying on topic and changing it; asking questions and answering them.
no code implementations • ACL 2019 • Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, Maximilian Nickel
Moreover -- and in contrast with other methods -- the hierarchical nature of hyperbolic space allows us to learn highly efficient representations and to improve the taxonomic consistency of the inferred hierarchies.
2 code implementations • ICLR 2019 • Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston
In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date.
2 code implementations • ACL 2018 • Stephen Roller, Douwe Kiela, Maximilian Nickel
Methods for unsupervised hypernym detection may broadly be categorized according to two paradigms: pattern-based and distributional methods.
no code implementations • IJCNLP 2017 • Su Wang, Stephen Roller, Katrin Erk
We test whether distributional models can do one-shot learning of definitional properties from text only.
no code implementations • EMNLP 2016 • Stephen Roller, Katrin Erk
We consider the task of predicting lexical entailment using distributional vectors.
no code implementations • NAACL 2016 • Ye Zhang, Stephen Roller, Byron Wallace
We introduce a novel, simple convolution neural network (CNN) architecture - multi-group norm constraint CNN (MGNC-CNN) that capitalizes on multiple sets of word embeddings for sentence classification.
1 code implementation • CL 2016 • I. Beltagy, Stephen Roller, Pengxiang Cheng, Katrin Erk, Raymond J. Mooney
In this paper, we focus on the three components of a practical system integrating logical and distributional models: 1) Parsing and task representation is the logic-based part where input problems are represented in probabilistic logic.