no code implementations • 21 Nov 2022 • Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex Goldin, Alden Hung, Jessica Landon, Jirka Lhotka, Timothy Lillicrap, Alistair Muldal, George Powell, Adam Santoro, Guy Scully, Sanjana Srivastava, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu
Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning.
no code implementations • 15 Sep 2022 • Zeb Kurth-Nelson, Timothy Behrens, Greg Wayne, Kevin Miller, Lennart Luettgau, Ray Dolan, Yunzhe Liu, Philipp Schwartenbeck
Replay in the brain has been viewed as rehearsal, or, more recently, as sampling from a transition model.
no code implementations • 26 May 2022 • Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex Goldin, Alden Hung, Jessica Landon, Timothy Lillicrap, Alistair Muldal, Blake Richards, Adam Santoro, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan
Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research.
no code implementations • 7 Dec 2021 • DeepMind Interactive Agents Team, Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Felix Fischer, Petko Georgiev, Alex Goldin, Mansi Gupta, Tim Harley, Felix Hill, Peter C Humphreys, Alden Hung, Jessica Landon, Timothy Lillicrap, Hamza Merzic, Alistair Muldal, Adam Santoro, Guy Scully, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language.
no code implementations • 8 Jul 2021 • Andrew Jaegle, Yury Sulsky, Arun Ahuja, Jake Bruce, Rob Fergus, Greg Wayne
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior.
2 code implementations • 24 Feb 2021 • David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song
We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two.
no code implementations • 10 Dec 2020 • Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Soňa Mokrá, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne, Duncan Williams, Nathaniel Wong, Chen Yan, Rui Zhu
These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone.
1 code implementation • NeurIPS 2020 • David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness
We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks.
no code implementations • 6 Feb 2020 • Adam Marblestone, Yan Wu, Greg Wayne
An ideal cognitively-inspired memory system would compress and organize incoming items.
1 code implementation • NeurIPS 2019 • Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos
We consider the problem of efficient credit assignment in reinforcement learning.
no code implementations • ICLR 2020 • Josh Merel, Diego Aldarondo, Jesse Marshall, Yuval Tassa, Greg Wayne, Bence Ölveczky
In this work, we develop a virtual rodent as a platform for the grounded study of motor activity in artificial models of embodied control.
no code implementations • 15 Nov 2019 • Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, Nicolas Heess
We address the longstanding challenge of producing flexible, realistic humanoid character controllers that can perform diverse whole-body tasks involving object interactions.
1 code implementation • NeurIPS 2019 • Ben Deverett, Ryan Faulkner, Meire Fortunato, Greg Wayne, Joel Z. Leibo
The measurement of time is central to intelligent behavior.
no code implementations • 18 Mar 2019 • Dhruva Tirumala, Hyeonwoo Noh, Alexandre Galashov, Leonard Hasenclever, Arun Ahuja, Greg Wayne, Razvan Pascanu, Yee Whye Teh, Nicolas Heess
As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important.
1 code implementation • ICLR 2019 • Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity.
no code implementations • ICLR 2019 • Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess
We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids.
no code implementations • ICLR 2019 • David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P. Lillicrap, Greg Wayne
We examine this issue in the context of reinforcement learning, in a setting where an agent is exposed to tasks in a sequence.
no code implementations • ICLR 2019 • Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Si-Qi Liu, Dhruva Tirumala, Nicolas Heess, Greg Wayne
We aim to build complex humanoid agents that integrate perception, motor control, and memory.
1 code implementation • NeurIPS 2018 • Yan Wu, Greg Wayne, Karol Gregor, Timothy Lillicrap
Based on the idea of memory writing as inference, as proposed in the Kanerva Machine, we show that a likelihood-based Lyapunov function emerges from maximising the variational lower-bound of a generative memory.
no code implementations • 15 Oct 2018 • Chia-Chun Hung, Timothy Lillicrap, Josh Abramson, Yan Wu, Mehdi Mirza, Federico Carnevale, Arun Ahuja, Greg Wayne
Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel".
no code implementations • ICLR 2018 • Yan Wu, Greg Wayne, Alex Graves, Timothy Lillicrap
We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them.
no code implementations • 3 Apr 2018 • Luis Piloto, Ari Weinstein, Dhruva TB, Arun Ahuja, Mehdi Mirza, Greg Wayne, David Amos, Chia-Chun Hung, Matt Botvinick
While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract general physical concepts directly from sensory data.
1 code implementation • 28 Mar 2018 • Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap
Animals execute goal-directed behaviours despite the limited range and scope of their sensors.
no code implementations • NeurIPS 2017 • Ziyu Wang, Josh Merel, Scott Reed, Greg Wayne, Nando de Freitas, Nicolas Heess
Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train.
1 code implementation • 7 Jul 2017 • Josh Merel, Yuval Tassa, Dhruva TB, Sriram Srinivasan, Jay Lemmon, Ziyu Wang, Greg Wayne, Nicolas Heess
Rapid progress in deep reinforcement learning has made it increasingly feasible to train controllers for high-dimensional humanoid bodies.
6 code implementations • 7 Jul 2017 • Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver
The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals.
no code implementations • 15 Feb 2017 • Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap
We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations.
no code implementations • NeurIPS 2016 • Jack W. Rae, Jonathan J. Hunt, Tim Harley, Ivo Danihelka, Andrew Senior, Greg Wayne, Alex Graves, Timothy P. Lillicrap
SAM learns with comparable data efficiency to existing models on a range of synthetic tasks and one-shot Omniglot character recognition, and can scale to tasks requiring $100,\! 000$s of time steps and memories.
Ranked #6 on Question Answering on bAbi (Mean Error Rate metric)
no code implementations • 17 Oct 2016 • Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, David Silver
We study a novel architecture and training procedure for locomotion tasks.
no code implementations • 13 Jun 2016 • Adam Marblestone, Greg Wayne, Konrad Kording
We hypothesize that (1) the brain optimizes cost functions, (2) these cost functions are diverse and differ across brain locations and over development, and (3) optimization operates within a pre-structured architecture matched to the computational problems posed by behavior.
Neurons and Cognition
3 code implementations • 9 Feb 2016 • Ivo Danihelka, Greg Wayne, Benigno Uria, Nal Kalchbrenner, Alex Graves
We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters.
3 code implementations • NeurIPS 2015 • Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez
One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.
35 code implementations • 20 Oct 2014 • Alex Graves, Greg Wayne, Ivo Danihelka
We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes.