Search Results for author: Stephen Roller

Found 26 papers, 9 papers with code

Staircase Attention for Recurrent Processing of Sequences

no code implementations8 Jun 2021 Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston

Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture.

Language Modelling

Hash Layers For Large Sparse Models

no code implementations8 Jun 2021 Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.

Language Modelling

Not All Memories are Created Equal: Learning to Forget by Expiring

1 code implementation13 May 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.

Language Modelling

Not All Memories are Created Equal: Learning to Expire

1 code implementation1 Jan 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.

Language Modelling

Adding Chit-Chat to Enhance Task-Oriented Dialogues

1 code implementation NAACL 2021 Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie

Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e. g., booking hotels), open-domain chatbots aim at making socially engaging conversations.

Dialogue Generation Dialogue Understanding +1

Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

no code implementations22 Jun 2020 Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet.

Continual Learning

Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

no code implementations ACL 2020 Margaret Li, Stephen Roller, Ilia Kulikov, Sean Welleck, Y-Lan Boureau, Kyunghyun Cho, Jason Weston

Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address.

The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

no code implementations ACL 2020 Kurt Shuster, Da Ju, Stephen Roller, Emily Dinan, Y-Lan Boureau, Jason Weston

We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images.

ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons

no code implementations6 Sep 2019 Margaret Li, Jason Weston, Stephen Roller

While dialogue remains an important end-goal of natural language research, the difficulty of evaluation is an oft-quoted reason why it remains troublesome to make real progress towards its solution.

Dialogue Evaluation

Neural Text Generation with Unlikelihood Training

1 code implementation ICLR 2020 Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston

Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core.

Text Generation

What makes a good conversation? How controllable attributes affect human judgments

1 code implementation NAACL 2019 Abigail See, Stephen Roller, Douwe Kiela, Jason Weston

A good conversation requires balance -- between simplicity and detail; staying on topic and changing it; asking questions and answering them.

Text Generation

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

no code implementations ACL 2019 Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, Maximilian Nickel

Moreover -- and in contrast with other methods -- the hierarchical nature of hyperbolic space allows us to learn highly efficient representations and to improve the taxonomic consistency of the inferred hierarchies.

Wizard of Wikipedia: Knowledge-Powered Conversational agents

2 code implementations ICLR 2019 Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date.

Dialogue Generation

Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora

2 code implementations ACL 2018 Stephen Roller, Douwe Kiela, Maximilian Nickel

Methods for unsupervised hypernym detection may broadly be categorized according to two paradigms: pattern-based and distributional methods.

Distributional Modeling on a Diet: One-shot Word Learning from Text Only

no code implementations IJCNLP 2017 Su Wang, Stephen Roller, Katrin Erk

We test whether distributional models can do one-shot learning of definitional properties from text only.

One-Shot Learning

MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification

no code implementations NAACL 2016 Ye Zhang, Stephen Roller, Byron Wallace

We introduce a novel, simple convolution neural network (CNN) architecture - multi-group norm constraint CNN (MGNC-CNN) that capitalizes on multiple sets of word embeddings for sentence classification.

General Classification Sentence Classification +1

Representing Meaning with a Combination of Logical and Distributional Models

1 code implementation CL 2016 I. Beltagy, Stephen Roller, Pengxiang Cheng, Katrin Erk, Raymond J. Mooney

In this paper, we focus on the three components of a practical system integrating logical and distributional models: 1) Parsing and task representation is the logic-based part where input problems are represented in probabilistic logic.

Lexical Entailment Natural Language Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.