Search Results for author: Natasha Jaques

Found 19 papers, 9 papers with code

Environment Generation for Zero-Shot Compositional Reinforcement Learning

no code implementations NeurIPS 2021 Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Manoj Tiwari, Honglak Lee, Aleksandra Faust

We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments.

Explore and Control with Adversarial Surprise

1 code implementation ICML Workshop URL 2021 Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang, Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine

In this paper, we propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.

Joint Attention for Multi-Agent Coordination and Social Learning

no code implementations15 Apr 2021 Dennis Lee, Natasha Jaques, Chase Kew, Jiaxing Wu, Douglas Eck, Dale Schuurmans, Aleksandra Faust

We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents.

Adversarial Environment Generation for Learning to Navigate the Web

no code implementations2 Mar 2021 Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust

The regret objective trains the adversary to design a curriculum of environments that are "just-the-right-challenge" for the navigator agents; our results show that over time, the adversary learns to generate increasingly complex web navigation tasks.

Decision Making

Human-centric Dialog Training via Offline Reinforcement Learning

1 code implementation EMNLP 2020 Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, Rosalind Picard

We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL).

Language Modelling Offline RL

Emergent Social Learning via Multi-agent Reinforcement Learning

no code implementations1 Oct 2020 Kamal Ndousse, Douglas Eck, Sergey Levine, Natasha Jaques

We analyze the reasons for this deficiency, and show that by imposing constraints on the training environment and introducing a model-based auxiliary loss we are able to obtain generalized social learning policies which enable agents to: i) discover complex skills that are not learned from single-agent training, and ii) adapt online to novel environments by taking cues from experts present in the new environment.

Imitation Learning Multi-agent Reinforcement Learning

Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog

no code implementations ICLR 2020 Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

This is a critical shortcoming for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment -- e. g. systems that learn from human interaction.

OpenAI Gym Open-Domain Dialog +1

Hierarchical Reinforcement Learning for Open-Domain Dialog

1 code implementation17 Sep 2019 Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Rosalind Picard

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text.

Hierarchical Reinforcement Learning Open-Domain Dialog

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

1 code implementation30 Jun 2019 Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment.

Open-Domain Dialog Q-Learning

Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

2 code implementations NeurIPS 2019 Asma Ghandeharioun, Judy Hanwen Shen, Natasha Jaques, Craig Ferguson, Noah Jones, Agata Lapedriza, Rosalind Picard

To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level.

Knowledge Distillation Open-Domain Dialog

Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

no code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.

Multi-agent Reinforcement Learning

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

3 code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.

Multi-agent Reinforcement Learning

Learning via social awareness: Improving a deep generative sketching model with facial feedback

no code implementations13 Feb 2018 Natasha Jaques, Jennifer McCleary, Jesse Engel, David Ha, Fred Bertsch, Rosalind Picard, Douglas Eck

We use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small group of viewers, by optimizing the model to produce sketches that it predicts will lead to more positive facial expressions.

Multimodal Autoencoder: A Deep Learning Approach to Filling In Missing Sensor Data and Enabling Better Mood Prediction

1 code implementation26 Oct 2017 Natasha Jaques, Sara Taylor, Akane Sano, Rosalind W. Picard

To accomplish forecasting of mood in real-world situations, affective computing systems need to collect and learn from multimodal data collected over weeks or months of daily use.

Denoising

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

no code implementations ICML 2017 Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity.

Fine-tuning

Cannot find the paper you are looking for? You can Submit a new open access paper.