5 code implementations • 4 Mar 2022 • Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.
no code implementations • 22 Sep 2021 • Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano
Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves.
1 code implementation • NeurIPS 2020 • Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul F. Christiano
We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning.
1 code implementation • 2 Sep 2020 • Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano
We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning.
no code implementations • 21 Jul 2020 • Shagun Sodhani, Mayoore S. Jaiswal, Lauren Baker, Koustuv Sinha, Carl Shneider, Peter Henderson, Joel Lehman, Ryan Lowe
This report documents ideas for improving the field of machine learning, which arose from discussions at the ML Retrospectives workshop at NeurIPS 2019.
1 code implementation • ACL 2020 • Koustuv Sinha, Prasanna Parthasarathi, Jasmine Wang, Ryan Lowe, William L. Hamilton, Joelle Pineau
Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue.
1 code implementation • ICLR 2020 • Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau
A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training.
no code implementations • WS 2019 • Abhinav Gupta, Ryan Lowe, Jakob Foerster, Douwe Kiela, Joelle Pineau
Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language.
1 code implementation • 12 Mar 2019 • Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin
How do we know if communication is emerging in a multi-agent system?
3 code implementations • 31 Jan 2019 • Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W. black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston
We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots.
1 code implementation • 24 Nov 2017 • Peter Henderson, Koustuv Sinha, Nicolas Angelard-Gontier, Nan Rosemary Ke, Genevieve Fried, Ryan Lowe, Joelle Pineau
The use of dialogue systems as a medium for human-machine interaction is an increasingly prevalent paradigm.
no code implementations • EMNLP 2017 • Teng Long, Emmanuel Bengio, Ryan Lowe, Jackie Chi Kit Cheung, Doina Precup
Humans interpret texts with respect to some background information, or world knowledge, and we would like to develop automatic reading comprehension systems that can do the same.
1 code implementation • ACL 2017 • Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau
Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.
80 code implementations • NeurIPS 2017 • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch
We explore deep reinforcement learning methods for multi-agent domains.
Ranked #1 on
SMAC+
on Def_Infantry_sequential
no code implementations • 1 Jan 2017 • Ryan Lowe, Nissan Pow, Iulian Vlad Serban, Laurent Charlin, Chia-Wei Liu, Joelle Pineau
In this paper, we analyze neural network-based dialogue systems trained in an end-to-end manner using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words.
no code implementations • 18 Nov 2016 • Iulian Vlad Serban, Ryan Lowe, Laurent Charlin, Joelle Pineau
Researchers have recently started investigating deep neural networks for dialogue applications.
3 code implementations • 24 Jul 2016 • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL).
Ranked #8 on
Machine Translation
on IWSLT2015 English-German
9 code implementations • 19 May 2016 • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio
Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.
no code implementations • WS 2016 • Ryan Lowe, Iulian V. Serban, Mike Noseworthy, Laurent Charlin, Joelle Pineau
An open challenge in constructing dialogue systems is developing methods for automatically learning dialogue strategies from large amounts of unlabelled data.
no code implementations • ACL 2016 • Teng Long, Ryan Lowe, Jackie Chi Kit Cheung, Doina Precup
Recent work in learning vector-space embeddings for multi-relational data has focused on combining relational information derived from knowledge bases with distributional information derived from large text corpora.
2 code implementations • EMNLP 2016 • Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent Charlin, Joelle Pineau
We investigate evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available.
4 code implementations • 17 Dec 2015 • Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, Joelle Pineau
During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models.
21 code implementations • WS 2015 • Ryan Lowe, Nissan Pow, Iulian Serban, Joelle Pineau
This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words.