Search Results for author: Stephen Roller

Found 36 papers, 15 papers with code

Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

no code implementations • 16 Aug 2023 • JianGuo Zhang, Stephen Roller, Kun Qian, Zhiwei Liu, Rui Meng, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models.

Natural Language Understanding Retrieval +1

Paper
Add Code

Leveraging Implicit Feedback from Deployment Data in Dialogue

no code implementations • 26 Jul 2023 • Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations.

Paper
Add Code

A Theory on Adam Instability in Large-Scale Machine Learning

no code implementations • 19 Apr 2023 • Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

We present a theory for the previously unexplained divergent behavior noticed in the training of large language models.

Language Modelling

Paper
Add Code

Scaling Laws for Generative Mixed-Modal Language Models

no code implementations • 10 Jan 2023 • Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens.

Paper
Add Code

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

1 code implementation • Science 2022 • Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sash Mitts, Aditya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.

1,238

Paper
Code

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

2 code implementations • 5 Aug 2022 • Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks.

Continual Learning

10,426

Paper
Code

OPT: Open Pre-trained Transformer Language Models

7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs

Hate Speech Detection Language Modelling +1

6,386

Paper
Code

Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

1 code implementation • 24 Mar 2022 • Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

We show that, when using SeeKeR as a dialogue model, it outperforms the state-of-the-art model BlenderBot 2 (Chen et al., 2021) on open-domain knowledge-grounded conversations for the same number of parameters, in terms of consistency, knowledge and per-turn engagingness.

Language Modelling Retrieval

10,426

Paper
Code

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

no code implementations • NLP4ConvAI (ACL) 2022 • Eric Michael Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, Jason Weston

At the heart of improving conversational AI is the open problem of how to evaluate conversations.

Dialogue Evaluation

Paper
Add Code

Teaching Models new APIs: Domain-Agnostic Simulators for Task Oriented Dialogue

no code implementations • 13 Oct 2021 • Moya Chen, Paul A. Crook, Stephen Roller

We demonstrate that large language models are able to simulate Task Oriented Dialogues in novel domains, provided only with an API implementation and a list of goals.

Paper
Add Code

Staircase Attention for Recurrent Processing of Sequences

1 code implementation • 8 Jun 2021 • Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston

Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture.

Language Modelling

136

Paper
Code

Hash Layers For Large Sparse Models

no code implementations • NeurIPS 2021 • Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.

Language Modelling

Paper
Add Code

Not All Memories are Created Equal: Learning to Forget by Expiring

1 code implementation • 13 May 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.

Ranked #4 on Language Modelling on enwik8

Language Modelling

136

Paper
Code

Not All Memories are Created Equal: Learning to Expire

1 code implementation • 1 Jan 2021 • Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.

Language Modelling

Paper
Code

Adding Chit-Chat to Enhance Task-Oriented Dialogues

1 code implementation • NAACL 2021 • Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie

Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e. g., booking hotels), open-domain chatbots aim at making socially engaging conversations.

Dialogue Generation Dialogue Understanding +1

Paper
Code

Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

no code implementations • 22 Jun 2020 • Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet.

Continual Learning

Paper
Add Code

Recipes for building an open-domain chatbot

7 code implementations • EACL 2021 • Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston

Building open-domain chatbots is a challenging area for machine learning research.

Chatbot

124,889

Paper
Code

Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

1 code implementation • ACL 2020 • Margaret Li, Stephen Roller, Ilia Kulikov, Sean Welleck, Y-Lan Boureau, Kyunghyun Cho, Jason Weston

Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address.

10,426

Paper
Code

The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

no code implementations • ACL 2020 • Kurt Shuster, Da Ju, Stephen Roller, Emily Dinan, Y-Lan Boureau, Jason Weston

We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images.

Paper
Add Code

ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons

no code implementations • 6 Sep 2019 • Margaret Li, Jason Weston, Stephen Roller

While dialogue remains an important end-goal of natural language research, the difficulty of evaluation is an oft-quoted reason why it remains troublesome to make real progress towards its solution.

Dialogue Evaluation

Paper
Add Code

Neural Text Generation with Unlikelihood Training

5 code implementations • ICLR 2020 • Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston

Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core.

Blocking Text Generation

308

Paper
Code

What makes a good conversation? How controllable attributes affect human judgments

2 code implementations • NAACL 2019 • Abigail See, Stephen Roller, Douwe Kiela, Jason Weston

A good conversation requires balance -- between simplicity and detail; staying on topic and changing it; asking questions and answering them.

Specificity Text Generation

Paper
Code

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

no code implementations • ACL 2019 • Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, Maximilian Nickel

Moreover -- and in contrast with other methods -- the hierarchical nature of hyperbolic space allows us to learn highly efficient representations and to improve the taxonomic consistency of the inferred hierarchies.

Paper
Add Code

Wizard of Wikipedia: Knowledge-Powered Conversational agents

2 code implementations • ICLR 2019 • Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date.

Dialogue Generation

10,426

Paper
Code

Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora

2 code implementations • ACL 2018 • Stephen Roller, Douwe Kiela, Maximilian Nickel

Methods for unsupervised hypernym detection may broadly be categorized according to two paradigms: pattern-based and distributional methods.

153

Paper
Code

Distributional Modeling on a Diet: One-shot Word Learning from Text Only

no code implementations • IJCNLP 2017 • Su Wang, Stephen Roller, Katrin Erk

We test whether distributional models can do one-shot learning of definitional properties from text only.

One-Shot Learning

Paper
Add Code

PIC a Different Word: A Simple Model for Lexical Substitution in Context

no code implementations • NAACL 2016 • Stephen Roller, Katrin Erk

Language Modelling

Paper
Add Code

Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment

no code implementations • EMNLP 2016 • Stephen Roller, Katrin Erk

We consider the task of predicting lexical entailment using distributional vectors.

Lexical Entailment

Paper
Add Code

MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification

no code implementations • NAACL 2016 • Ye Zhang, Stephen Roller, Byron Wallace

We introduce a novel, simple convolution neural network (CNN) architecture - multi-group norm constraint CNN (MGNC-CNN) that capitalizes on multiple sets of word embeddings for sentence classification.

General Classification Sentence +2

Paper
Add Code

Representing Meaning with a Combination of Logical and Distributional Models

1 code implementation • CL 2016 • I. Beltagy, Stephen Roller, Pengxiang Cheng, Katrin Erk, Raymond J. Mooney

In this paper, we focus on the three components of a practical system integrating logical and distributional models: 1) Parsing and task representation is the logic-based part where input problems are represented in probabilistic logic.

Lexical Entailment Natural Language Inference +2