Search Results for author: Nicolas Heess

Found 136 papers, 48 papers with code

CoMic: Co-Training and Mimicry for Reusable Skills

no code implementations • ICML 2020 • Leonard Hasenclever, Fabio Pardo, Raia Hadsell, Nicolas Heess, Josh Merel

Finally we show that it is possible to interleave the motion capture tracking with training on complementary tasks, enriching the resulting skill space, and enabling the reuse of skills not well covered by the motion capture data such as getting up from the ground or catching a ball.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

A distributional view on multi objective policy optimization

no code implementations • ICML 2020 • Abbas Abdolmaleki, Sandy Huang, Leonard Hasenclever, Michael Neunert, Martina Zambelli, Murilo Martins, Francis Song, Nicolas Heess, Raia Hadsell, Martin Riedmiller

Many real-world problems require trading off multiple competing objectives.

Multi-Objective Reinforcement Learning

Paper
Add Code

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models

no code implementations • 4 Apr 2024 • Noah Y. Siegel, Oana-Maria Camburu, Nicolas Heess, Maria Perez-Ortiz

In this work, we introduce Correlational Explanatory Faithfulness (CEF), a metric that can be used in faithfulness tests based on input interventions.

counterfactual Decision Making

Paper
Add Code

Genie: Generative Interactive Environments

no code implementations • 23 Feb 2024 • Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos.

Paper
Add Code

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

no code implementations • 12 Feb 2024 • Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

In each iteration, the image is annotated with a visual representation of proposals that the VLM can refer to (e. g., candidate robot actions, localizations, or trajectories).

Instruction Following Logical Reasoning +3

Paper
Add Code

Offline Actor-Critic Reinforcement Learning Scales to Large Models

no code implementations • 8 Feb 2024 • Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning.

Continuous Control Offline RL +1

Paper
Add Code

Neural Population Learning beyond Symmetric Zero-sum Games

no code implementations • 10 Jan 2024 • SiQi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess

We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game.

Transfer Learning

Paper
Add Code

Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities

no code implementations • 4 Dec 2023 • Markus Wulfmeier, Arunkumar Byravan, Sarah Bechtle, Karol Hausman, Nicolas Heess

Contemporary artificial intelligence systems exhibit rapidly growing abilities accompanied by the growth of required resources, expansive datasets and corresponding investments into computing infrastructure.

Computational Efficiency reinforcement-learning +1

Paper
Add Code

Replay across Experiments: A Natural Extension of Off-Policy RL

no code implementations • 27 Nov 2023 • Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin Riedmiller, Nicolas Heess, Markus Wulfmeier

Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL).

Reinforcement Learning (RL)

Paper
Add Code

TacticAI: an AI assistant for football tactics

no code implementations • 16 Oct 2023 • Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls

The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC.

Retrieval

Paper
Add Code

Policy composition in reinforcement learning via multi-objective policy optimization

no code implementations • 29 Aug 2023 • Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup

In two domains with continuous observation and action spaces, our agents successfully compose teacher policies in sequence and in parallel, and are also able to further extend the policies of the teachers in order to solve the task.

reinforcement-learning

Paper
Add Code

Towards A Unified Agent with Foundation Models

no code implementations • 18 Jul 2023 • Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, Martin Riedmiller

Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others.

Efficient Exploration Reinforcement Learning (RL) +2

Paper
Add Code

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

no code implementations • 20 Jun 2023 • Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth, Jean-Baptiste Regli, Oleg Sushkov, Tom Rothörl, José Enrique Chen, Yusuf Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg, Raia Hadsell, Francesco Nori, Nicolas Heess

With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task.

Paper
Add Code

Language to Rewards for Robotic Skill Synthesis

no code implementations • 14 Jun 2023 • Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia

However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot.

In-Context Learning Logical Reasoning

Paper
Add Code

Coherent Soft Imitation Learning

1 code implementation • NeurIPS 2023 • Joe Watson, Sandy H. Huang, Nicolas Heess

Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward.

Imitation Learning reinforcement-learning

Paper
Code

Barkour: Benchmarking Animal-level Agility with Quadruped Robots

no code implementations • 24 May 2023 • Ken Caluwaerts, Atil Iscen, J. Chase Kew, Wenhao Yu, Tingnan Zhang, Daniel Freeman, Kuang-Huei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, Nathan Batchelor, Steven Bohez, Federico Casarini, Jose Enrique Chen, Omar Cortes, Erwin Coumans, Adil Dostmohamed, Gabriel Dulac-Arnold, Alejandro Escontrela, Erik Frey, Roland Hafner, Deepali Jain, Bauyrjan Jyenis, Yuheng Kuang, Edward Lee, Linda Luu, Ofir Nachum, Ken Oslund, Jason Powell, Diego Reyes, Francesco Romano, Feresteh Sadeghi, Ron Sloat, Baruch Tabanpour, Daniel Zheng, Michael Neunert, Raia Hadsell, Nicolas Heess, Francesco Nori, Jeff Seto, Carolina Parada, Vikas Sindhwani, Vincent Vanhoucke, Jie Tan

In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived environment and robot states.

Benchmarking Navigate

Paper
Add Code

A Generalist Dynamics Model for Control

no code implementations • 18 May 2023 • Ingmar Schubert, Jingwei Zhang, Jake Bruce, Sarah Bechtle, Emilio Parisotto, Martin Riedmiller, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Nicolas Heess

We investigate the use of transformer sequence models as dynamics models (TDMs) for control.

Paper
Add Code

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

no code implementations • 26 Apr 2023 • Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess

We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments.

reinforcement-learning

Paper
Add Code

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

no code implementations • 13 Apr 2023 • Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar

We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model.

Paper
Add Code

Leveraging Jumpy Models for Planning and Fast Learning in Robotic Domains

no code implementations • 24 Feb 2023 • Jingwei Zhang, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Abbas Abdolmaleki, Dushyant Rao, Nicolas Heess, Martin Riedmiller

We conduct a set of experiments in the RGB-stacking environment, showing that planning with the learned skills and the associated model can enable zero-shot generalization to new tasks, and can further speed up training of policies via reinforcement learning.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Representation Learning in Deep RL via Discrete Information Bottleneck

no code implementations • 28 Dec 2022 • Riashat Islam, Hongyu Zang, Manan Tomar, Aniket Didolkar, Md Mofijul Islam, Samin Yeasar Arnob, Tariq Iqbal, Xin Li, Anirudh Goyal, Nicolas Heess, Alex Lamb

Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations.

Offline RL Reinforcement Learning (RL) +1

Paper
Add Code

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

no code implementations • 24 Nov 2022 • Giulia Vezzani, Dhruva Tirumala, Markus Wulfmeier, Dushyant Rao, Abbas Abdolmaleki, Ben Moran, Tuomas Haarnoja, Jan Humplik, Roland Hafner, Michael Neunert, Claudio Fantacci, Tim Hertweck, Thomas Lampe, Fereshteh Sadeghi, Nicolas Heess, Martin Riedmiller

The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents.

Reinforcement Learning (RL)

Paper
Add Code

NeRF2Real: Sim2real Transfer of Vision-guided Bipedal Motion Skills using Neural Radiance Fields

no code implementations • 10 Oct 2022 • Arunkumar Byravan, Jan Humplik, Leonard Hasenclever, Arthur Brussee, Francesco Nori, Tuomas Haarnoja, Ben Moran, Steven Bohez, Fereshteh Sadeghi, Bojan Vujatovic, Nicolas Heess

A simulation is then created using the rendering engine in a physics simulator which computes contact dynamics from the static scene geometry (estimated from the NeRF volume density) and the dynamic objects' geometry and physical properties (assumed known).

Novel View Synthesis

Paper
Add Code

Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

2 code implementations • 4 Oct 2022 • Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

We formalize the notions of coordination level and heterogeneity level of an environment and present HECOGrid, a suite of multi-agent RL environments that facilitates empirical evaluation of different MARL approaches across different levels of coordination and environmental heterogeneity by providing a quantitative control over coordination and heterogeneity levels of the environment.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

MO2: Model-Based Offline Options

no code implementations • 5 Sep 2022 • Sasha Salter, Markus Wulfmeier, Dhruva Tirumala, Nicolas Heess, Martin Riedmiller, Raia Hadsell, Dushyant Rao

The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence.

Continuous Control

Paper
Add Code

Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

no code implementations • 31 May 2022 • SiQi Liu, Marc Lanctot, Luke Marris, Nicolas Heess

Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games.

Paper
Add Code

Data augmentation for efficient learning from parametric experts

no code implementations • NeurIPS 2021 • Alexandre Galashov, Josh Merel, Nicolas Heess

This setting arises naturally in a number of problems, for instance as variants of behavior cloning, or as a component of other algorithms such as DAGGER, policy distillation or KL-regularized RL.

Data Augmentation Imitation Learning

Paper
Add Code

Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

no code implementations • 21 May 2022 • Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one another.

Intelligent Communication Multi-agent Reinforcement Learning +2

Paper
Add Code

A Generalist Agent

3 code implementations • DeepMind 2022 • Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs.

Ranked #1 on Skill Generalization on RGB-Stacking

Language Modelling Skill Generalization +1

187

Paper
Code

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

1 code implementation • 21 Apr 2022 • Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, SiQi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks.

Continuous Control reinforcement-learning +1

Paper
Code

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

1 code implementation • ICLR 2022 • Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, Arthur Guez

We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.

Offline RL Off-policy evaluation +1

Paper
Code

Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data

no code implementations • 12 Apr 2022 • Wenxuan Zhou, Steven Bohez, Jan Humplik, Abbas Abdolmaleki, Dushyant Rao, Markus Wulfmeier, Tuomas Haarnoja, Nicolas Heess

We propose the Offline Distillation Pipeline to break this trade-off by separating the training procedure into an online interaction phase and an offline distillation phase. Second, we find that training with the imbalanced off-policy data from multiple environments across the lifetime creates a significant performance drop.

Reinforcement Learning (RL)

Paper
Add Code

Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors

no code implementations • 31 Mar 2022 • Steven Bohez, Saran Tunyasuvunakool, Philemon Brakel, Fereshteh Sadeghi, Leonard Hasenclever, Yuval Tassa, Emilio Parisotto, Jan Humplik, Tuomas Haarnoja, Roland Hafner, Markus Wulfmeier, Michael Neunert, Ben Moran, Noah Siegel, Andrea Huber, Francesco Romano, Nathan Batchelor, Federico Casarini, Josh Merel, Raia Hadsell, Nicolas Heess

We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots.

Paper
Add Code

Retrieval-Augmented Reinforcement Learning

no code implementations • 17 Feb 2022 • Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent SIfre, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell

In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

NeuPL: Neural Population Learning

no code implementations • ICLR 2022 • SiQi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel

Learning in strategy games (e. g. StarCraft, poker) requires the discovery of diverse policies.

Starcraft Transfer Learning

Paper
Add Code

Learning Transferable Motor Skills with Hierarchical Latent Mixture Policies

no code implementations • ICLR 2022 • Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli, Giulia Vezzani, Dhruva Tirumala, Yusuf Aytar, Josh Merel, Nicolas Heess, Raia Hadsell

We demonstrate in manipulation domains that the method can effectively cluster offline data into distinct, executable behaviours, while retaining the flexibility of a continuous latent variable model.

Paper
Add Code

Entropic Desired Dynamics for Intrinsic Control

no code implementations • NeurIPS 2021 • Steven Hansen, Guillaume Desjardins, Kate Baumli, David Warde-Farley, Nicolas Heess, Simon Osindero, Volodymyr Mnih

An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach.

Montezuma's Revenge

Paper
Add Code

Learning Coordinated Terrain-Adaptive Locomotion by Imitating a Centroidal Dynamics Planner

no code implementations • 30 Oct 2021 • Philemon Brakel, Steven Bohez, Leonard Hasenclever, Nicolas Heess, Konstantinos Bousmalis

Imitation learning circumvents this problem and has been used with motion capture data to extract quadruped gaits for flat terrains.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Evaluating model-based planning and planner amortization for continuous control

no code implementations • ICLR 2022 • Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller

There is a widespread intuition that model-based control methods should be able to surpass the data efficiency of model-free approaches.

Continuous Control Model Predictive Control

Paper
Add Code

Learning Dynamics Models for Model Predictive Agents

no code implementations • 29 Sep 2021 • Michael Lutter, Leonard Hasenclever, Arunkumar Byravan, Gabriel Dulac-Arnold, Piotr Trochim, Nicolas Heess, Josh Merel, Yuval Tassa

This paper sets out to disambiguate the role of different design choices for learning dynamics models, by comparing their performance to planning with a ground-truth model -- the simulator.

Model-based Reinforcement Learning

Paper
Add Code

Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

no code implementations • 17 Sep 2021 • Oliver Groth, Markus Wulfmeier, Giulia Vezzani, Vibhavari Dasagi, Tim Hertweck, Roland Hafner, Nicolas Heess, Martin Riedmiller

Curiosity-based reward schemes can present powerful exploration mechanisms which facilitate the discovery of solutions for complex, sparse or long-horizon tasks.

Paper
Add Code

Collect & Infer -- a fresh look at data-efficient Reinforcement Learning

no code implementations • 23 Aug 2021 • Martin Riedmiller, Jost Tobias Springenberg, Roland Hafner, Nicolas Heess

This position paper proposes a fresh look at Reinforcement Learning (RL) from the perspective of data-efficiency.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

no code implementations • 15 Jun 2021 • Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.

Offline RL reinforcement-learning +1

Paper
Add Code

From Motor Control to Team Play in Simulated Humanoid Football

1 code implementation • 25 May 2021 • SiQi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds.

Imitation Learning Multi-agent Reinforcement Learning +1

3,539

Paper
Code

Neural Production Systems: Learning Rule-Governed Visual Dynamics

no code implementations • NeurIPS 2021 • Anirudh Goyal, Aniket Didolkar, Nan Rosemary Ke, Charles Blundell, Philippe Beaudoin, Nicolas Heess, Michael Mozer, Yoshua Bengio

First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be.

Paper
Add Code

Explicit Pareto Front Optimization for Constrained Reinforcement Learning

no code implementations • 1 Jan 2021 • Sandy Huang, Abbas Abdolmaleki, Philemon Brakel, Steven Bohez, Nicolas Heess, Martin Riedmiller, Raia Hadsell

We propose a framework that uses a multi-objective RL algorithm to find a Pareto front of policies that trades off between the reward and constraint(s), and simultaneously searches along this front for constraint-satisfying policies.

Continuous Control reinforcement-learning +1

Paper
Add Code

Divide-and-Conquer Monte Carlo Tree Search

no code implementations • 1 Jan 2021 • Giambattista Parascandolo, Lars Holger Buesing, Josh Merel, Leonard Hasenclever, John Aslanides, Jessica B Hamrick, Nicolas Heess, Alexander Neitz, Theophane Weber

are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed.

Continuous Control Decision Making +1

Paper
Add Code

Model-Free Counterfactual Credit Assignment

no code implementations • 1 Jan 2021 • Thomas Mesnard, Theophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Marcus Hutter, Lars Holger Buesing, Remi Munos

Credit assignment in reinforcement learning is the problem of measuring an action’s influence on future rewards.

counterfactual valid

Paper
Add Code

RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning

1 code implementation • NeurIPS 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Offline RL reinforcement-learning +1

12,778

Paper
Code

Game Plan: What AI can do for Football, and What Football can do for AI

1 code implementation • 18 Nov 2020 • Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis.

BIG-bench Machine Learning counterfactual +1

Paper
Code

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

no code implementations • 18 Nov 2020 • Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos

Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards.

counterfactual reinforcement-learning +1

Paper
Add Code

Behavior Priors for Efficient Reinforcement Learning

no code implementations • 27 Oct 2020 • Dhruva Tirumala, Alexandre Galashov, Hyeonwoo Noh, Leonard Hasenclever, Razvan Pascanu, Jonathan Schwarz, Guillaume Desjardins, Wojciech Marian Czarnecki, Arun Ahuja, Yee Whye Teh, Nicolas Heess

In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors that capture the common movement and interaction patterns that are shared across a set of related tasks or contexts.

Continuous Control Hierarchical Reinforcement Learning +3

Paper
Add Code

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

no code implementations • 20 Oct 2020 • Daniel J. Mankowitz, Dan A. Calian, Rae Jeong, Cosmin Paduraru, Nicolas Heess, Sumanth Dathathri, Martin Riedmiller, Timothy Mann

Many real-world physical control systems are required to satisfy constraints upon deployment.

Continuous Control reinforcement-learning +1

Paper
Add Code

Learning Dexterous Manipulation from Suboptimal Experts

no code implementations • 16 Oct 2020 • Rae Jeong, Jost Tobias Springenberg, Jackie Kay, Daniel Zheng, Yuxiang Zhou, Alexandre Galashov, Nicolas Heess, Francesco Nori

Although in many cases the learning process could be guided by demonstrations or other suboptimal experts, current RL algorithms for continuous action spaces often fail to effectively utilize combinations of highly off-policy expert data and on-policy exploration data.

Offline RL Q-Learning

Paper
Add Code

Local Search for Policy Iteration in Continuous Control

no code implementations • 12 Oct 2020 • Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov, Martin Riedmiller

We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

Temporal Difference Uncertainties as a Signal for Exploration

no code implementations • 5 Oct 2020 • Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin, Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre Barreto, Razvan Pascanu

Instead, we incorporate it as an intrinsic reward and treat exploration as a separate learning problem, induced by the agent's temporal difference uncertainties.

Paper
Add Code

Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

no code implementations • 3 Oct 2020 • Peter Karkus, Mehdi Mirza, Arthur Guez, Andrew Jaegle, Timothy Lillicrap, Lars Buesing, Nicolas Heess, Theophane Weber

We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning to swim in potential flow

1 code implementation • 30 Sep 2020 • Yusheng Jiao, Feng Ling, Sina Heydari, Nicolas Heess, Josh Merel, Eva Kanso

To address the problem of underwater motion planning, we propose a simple model of a three-link fish swimming in a potential flow environment and we use model-free reinforcement learning for shape control.

Motion Planning reinforcement-learning +1

Paper
Code

Physically Embedded Planning Problems: New Challenges for Reinforcement Learning

1 code implementation • 11 Sep 2020 • Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess

To encourage progress towards this goal we introduce a set of physically embedded planning problems and make them publicly available.

reinforcement-learning Reinforcement Learning (RL)

12,779

Paper
Code

Importance Weighted Policy Learning and Adaptation

no code implementations • 10 Sep 2020 • Alexandre Galashov, Jakub Sygnowski, Guillaume Desjardins, Jan Humplik, Leonard Hasenclever, Rae Jeong, Yee Whye Teh, Nicolas Heess

The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Action and Perception as Divergence Minimization

1 code implementation • 3 Sep 2020 • Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess

While the narrow objectives correspond to domain-specific rewards as typical in reinforcement learning, the general objectives maximize information with the environment through latent variable models of input sequences.

Decision Making Representation Learning

Paper
Code

Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion

no code implementations • 6 Aug 2020 • Roland Hafner, Tim Hertweck, Philipp Klöppner, Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller

Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs.

Reinforcement Learning (RL)

Paper
Add Code

Data-efficient Hindsight Off-policy Option Learning

no code implementations • 30 Jul 2020 • Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller

We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm.

Robot Manipulation

Paper
Add Code

Critic Regularized Regression

5 code implementations • NeurIPS 2020 • Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Offline RL regression +1

30,980

Paper
Code

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations • 24 Jun 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

Atari Games DQN Replay Dataset +3

12,778

Paper
Code

dm_control: Software and Tasks for Continuous Control

2 code implementations • 22 Jun 2020 • Yuval Tassa, Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Piotr Trochim, Si-Qi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, Nicolas Heess

The dm_control software package is a collection of Python libraries and task suites for reinforcement learning agents in an articulated-body simulation.

Continuous Control reinforcement-learning +1

3,539

Paper
Code

A Distributional View on Multi-Objective Policy Optimization

1 code implementation • 15 May 2020 • Abbas Abdolmaleki, Sandy H. Huang, Leonard Hasenclever, Michael Neunert, H. Francis Song, Martina Zambelli, Murilo F. Martins, Nicolas Heess, Raia Hadsell, Martin Riedmiller

Many real-world problems require trading off multiple competing objectives.

Multi-Objective Reinforcement Learning

3,539

Paper
Code

Simple Sensor Intentions for Exploration

no code implementations • 15 May 2020 • Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition.

Paper
Add Code

Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning

no code implementations • ICLR 2020 • Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller

In practice, however, standard off-policy algorithms fail in the batch setting for continuous control.

Continuous Control Multi-Task Learning +2

Paper
Add Code

Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning

no code implementations • 23 Apr 2020 • Giambattista Parascandolo, Lars Buesing, Josh Merel, Leonard Hasenclever, John Aslanides, Jessica B. Hamrick, Nicolas Heess, Alexander Neitz, Theophane Weber

are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed.

Continuous Control Decision Making +1

Paper
Add Code

Value-driven Hindsight Modelling

no code implementations • NeurIPS 2020 • Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess

Value estimation is a critical component of the reinforcement learning (RL) paradigm.

Atari Games Reinforcement Learning (RL) +2

Paper
Add Code

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

no code implementations • 19 Feb 2020 • Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller

In practice, however, standard off-policy algorithms fail in the batch setting for continuous control.

Continuous Control Multi-Task Learning +2

Paper
Add Code

Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

no code implementations • 2 Jan 2020 • Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller

In contrast, we propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Hindsight Credit Assignment

1 code implementation • NeurIPS 2019 • Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos

We consider the problem of efficient credit assignment in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks

no code implementations • 15 Nov 2019 • Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, Nicolas Heess

We address the longstanding challenge of producing flexible, realistic humanoid character controllers that can perform diverse whole-body tasks involving object interactions.

Paper
Add Code

Quinoa: a Q-function You Infer Normalized Over Actions

no code implementations • 5 Nov 2019 • Jonas Degrave, Abbas Abdolmaleki, Jost Tobias Springenberg, Nicolas Heess, Martin Riedmiller

We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form.

Normalising Flows reinforcement-learning +1

Paper
Add Code

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

1 code implementation • 15 Oct 2019 • Lars Buesing, Nicolas Heess, Theophane Weber

A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models.

Decision Making Decision Making Under Uncertainty

Paper
Code

Stabilizing Transformers for Reinforcement Learning

5 code implementations • ICML 2020 • Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting.

General Reinforcement Learning Language Modelling +4

2,505

Paper
Code

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

no code implementations • 9 Oct 2019 • Arunkumar Byravan, Jost Tobias Springenberg, Abbas Abdolmaleki, Roland Hafner, Michael Neunert, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments.

Model-based Reinforcement Learning Reinforcement Learning (RL) +2

Paper
Add Code

A Generalized Training Approach for Multiagent Learning

1 code implementation • ICLR 2020 • Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Si-Qi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos

This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).

3,989

Paper
Code

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

1 code implementation • ICLR 2020 • H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Si-Qi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick

Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting.

Continuous Control OpenAI Gym +1

Paper
Code

Compositional Transfer in Hierarchical Reinforcement Learning

no code implementations • 26 Jun 2019 • Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements.

General Reinforcement Learning Hierarchical Reinforcement Learning +4

Paper
Add Code

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

no code implementations • NeurIPS 2020 • Guy Lorberbom, Chris J. Maddison, Nicolas Heess, Tamir Hazan, Daniel Tarlow

A main benefit of DirPG algorithms is that they allow the insertion of domain knowledge in the form of upper bounds on return-to-go at training time, like is used in heuristic search, while still directly computing a policy gradient.

Paper
Add Code

Meta reinforcement learning as task inference

1 code implementation • 15 May 2019 • Jan Humplik, Alexandre Galashov, Leonard Hasenclever, Pedro A. Ortega, Yee Whye Teh, Nicolas Heess

This includes proposals to learn the learning algorithm itself, an idea also known as meta learning.

Continuous Control Meta-Learning +3

172

Paper
Code

Meta-learning of Sequential Strategies

no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg

In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.

Meta-Learning

Paper
Add Code

Information asymmetry in KL-regularized RL

1 code implementation • ICLR 2019 • Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess

In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning.

Paper
Code

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL

no code implementations • 18 Mar 2019 • Dhruva Tirumala, Hyeonwoo Noh, Alexandre Galashov, Leonard Hasenclever, Arun Ahuja, Greg Wayne, Razvan Pascanu, Yee Whye Teh, Nicolas Heess

As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important.

Continuous Control reinforcement-learning +1

Paper
Add Code

The Termination Critic

no code implementations • 26 Feb 2019 • Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents.

Paper
Add Code

Emergent Coordination Through Competition

no code implementations • ICLR 2019 • Si-Qi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, Thore Graepel

We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

Credit Assignment Techniques in Stochastic Computation Graphs

no code implementations • 7 Jan 2019 • Théophane Weber, Nicolas Heess, Lars Buesing, David Silver

Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Self-supervised Learning of Image Embedding for Continuous Control

1 code implementation • 3 Jan 2019 • Carlos Florensa, Jonas Degrave, Nicolas Heess, Jost Tobias Springenberg, Martin Riedmiller

Operating directly from raw high dimensional sensory inputs like images is still a challenge for robotic control.

Continuous Control Self-Supervised Learning

Paper
Code

Composing Entropic Policies using Divergence Correction

no code implementations • 5 Dec 2018 • Jonathan J. Hunt, Andre Barreto, Timothy P. Lillicrap, Nicolas Heess

Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

Relative Entropy Regularized Policy Iteration

1 code implementation • 5 Dec 2018 • Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa, Dan Belov, Nicolas Heess, Martin Riedmiller

Our algorithm draws on connections to existing literature on black-box optimization and 'RL as an inference' and it can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) [Abdolmaleki et al., 2018a], or as an extension of Trust Region Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) [Abdolmaleki et al., 2017b; Hansen et al., 1997] to a policy iteration scheme.

Continuous Control OpenAI Gym +1

Paper
Code

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

no code implementations • ICLR 2019 • Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, Tom Erez, Avraham Ruderman, Keith Anderson, Krishmamurthy, Dvijotham, Nicolas Heess, Pushmeet Kohli

We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation.

Autonomous Driving Humanoid Control

Paper
Add Code

Neural probabilistic motor primitives for humanoid control

no code implementations • ICLR 2019 • Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess

We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids.

Humanoid Control

Paper
Add Code

Hierarchical visuomotor control of humanoids

no code implementations • ICLR 2019 • Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Si-Qi Liu, Dhruva Tirumala, Nicolas Heess, Greg Wayne

We aim to build complex humanoid agents that integrate perception, motor control, and memory.

Paper
Add Code

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

no code implementations • ICLR 2019 • Lars Buesing, Theophane Weber, Yori Zwols, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau, Nicolas Heess

In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data.

counterfactual

Paper
Add Code

Success at any cost: value constrained model-free continuous control

no code implementations • 27 Sep 2018 • Steven Bohez, Abbas Abdolmaleki, Michael Neunert, Jonas Buchli, Nicolas Heess, Raia Hadsell

We demonstrate the efficiency of our approach using a number of continuous control benchmark tasks as well as a realistic, energy-optimized quadruped locomotion task.

Continuous Control

Paper
Add Code

Mix & Match - Agent Curricula for Reinforcement Learning

no code implementations • ICML 2018 • Wojciech Czarnecki, Siddhant Jayakumar, Max Jaderberg, Leonard Hasenclever, Yee Whye Teh, Nicolas Heess, Simon Osindero, Razvan Pascanu

We introduce Mix and match (M&M) – a training framework designed to facilitate rapid and effective learning in RL agents that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Maximum a Posteriori Policy Optimisation

3 code implementations • ICLR 2018 • Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective.

Continuous Control reinforcement-learning +1

113

Paper
Code

Mix&Match - Agent Curricula for Reinforcement Learning

no code implementations • 5 Jun 2018 • Wojciech Marian Czarnecki, Siddhant M. Jayakumar, Max Jaderberg, Leonard Hasenclever, Yee Whye Teh, Simon Osindero, Nicolas Heess, Razvan Pascanu

(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Relational inductive biases, deep learning, and graph networks

31 code implementations • 4 Jun 2018 • Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, Razvan Pascanu

As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.

Decision Making Inductive Bias +1

5,323

Paper
Code

Graph networks as learnable physics engines for inference and control

1 code implementation • ICML 2018 • Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia

Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model.

Inductive Bias

Paper
Code

Distributed Distributional Deterministic Policy Gradients

5 code implementations • ICLR 2018 • Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting.

Continuous Control Reinforcement Learning (RL)

2,505

Paper
Code

Learning by Playing - Solving Sparse Reward Tasks from Scratch

1 code implementation • ICML 2018 • Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg

We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL).

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

1 code implementation • ICLR 2018 • Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas, Nicolas Heess

We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent.

Imitation Learning reinforcement-learning +1

Paper
Code

Learning an Embedding Space for Transferable Robot Skills

no code implementations • ICLR 2018 • Karol Hausman, Jost Tobias Springenberg, Ziyu Wang, Nicolas Heess, Martin Riedmiller

We present a method for reinforcement learning of closely related skills that are parameterized via a skill embedding space.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

4 code implementations • 27 Jul 2017 • Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, Martin Riedmiller

We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards.

reinforcement-learning Reinforcement Learning (RL)

813

Paper
Code

Learning model-based planning from scratch

2 code implementations • 19 Jul 2017 • Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, Peter Battaglia

Here we introduce the "Imagination-based Planner", the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans.

Continuous Control Decision Making

Paper
Code

Imagination-Augmented Agents for Deep Reinforcement Learning

2 code implementations • NeurIPS 2017 • Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra

We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Distral: Robust Multitask Reinforcement Learning

no code implementations • NeurIPS 2017 • Yee Whye Teh, Victor Bapst, Wojciech Marian Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, Razvan Pascanu

Moreover, the proposed learning process is more robust and more stable---attributes that are critical in deep reinforcement learning.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Robust Imitation of Diverse Behaviors

no code implementations • NeurIPS 2017 • Ziyu Wang, Josh Merel, Scott Reed, Greg Wayne, Nando de Freitas, Nicolas Heess

Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train.

Imitation Learning

Paper
Add Code

Learning human behaviors from motion capture by adversarial imitation

1 code implementation • 7 Jul 2017 • Josh Merel, Yuval Tassa, Dhruva TB, Sriram Srinivasan, Jay Lemmon, Ziyu Wang, Greg Wayne, Nicolas Heess

Rapid progress in deep reinforcement learning has made it increasingly feasible to train controllers for high-dimensional humanoid bodies.

Imitation Learning reinforcement-learning +1

Paper
Code

Emergence of Locomotion Behaviours in Rich Environments

6 code implementations • 7 Jul 2017 • Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver

The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals.

reinforcement-learning Reinforcement Learning (RL)

614

Paper
Code

Learning Hierarchical Information Flow with Recurrent Neural Modules

no code implementations • NeurIPS 2017 • Danijar Hafner, Alex Irpan, James Davidson, Nicolas Heess

We propose ThalNet, a deep learning model inspired by neocortical communication via the thalamus.

Paper
Add Code

Filtering Variational Objectives

3 code implementations • NeurIPS 2017 • Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, andriy mnih, Arnaud Doucet, Yee Whye Teh

When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results.

76,579

Paper
Code

Metacontrol for Adaptive Imagination-Based Optimization

1 code implementation • 7 May 2017 • Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia

The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration.

Decision Making

Paper
Code

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

no code implementations • ICLR 2018 • Ivaylo Popov, Nicolas Heess, Timothy Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, Martin Riedmiller

Solving this difficult and practically relevant problem in the real world is an important long-term goal for the field of robotics.

Continuous Control Q-Learning +2

Paper
Add Code

Particle Value Functions

no code implementations • 16 Mar 2017 • Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, andriy mnih, Yee Whye Teh

The policy gradients of the expected return objective can react slowly to rare rewards.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

FeUdal Networks for Hierarchical Reinforcement Learning

1 code implementation • ICML 2017 • Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu

We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Code

Sample Efficient Actor-Critic with Experience Replay

8 code implementations • 3 Nov 2016 • Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems.

Continuous Control reinforcement-learning +1

4,038

Paper
Code

Learning and Transfer of Modulated Locomotor Controllers

no code implementations • 17 Oct 2016 • Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, David Silver

We study a novel architecture and training procedure for locomotion tasks.

Paper
Add Code

Sim-to-Real Robot Learning from Pixels with Progressive Nets

no code implementations • 13 Oct 2016 • Andrei A. Rusu, Mel Vecerik, Thomas Rothörl, Nicolas Heess, Razvan Pascanu, Raia Hadsell

The progressive net approach is a general framework that enables reuse of everything from low-level visual features to high-level policies for transfer to new tasks, enabling a compositional, yet simple, approach to building complex skills.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Unsupervised Learning of 3D Structure from Images

1 code implementation • NeurIPS 2016 • Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess

A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world.

Paper
Code

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

2 code implementations • NeurIPS 2016 • S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey E. Hinton

We present a framework for efficient inference in structured image models that explicitly reason about objects.

Scene Understanding

Paper
Code

Memory-based control with recurrent neural networks

3 code implementations • 14 Dec 2015 • Nicolas Heess, Jonathan J. Hunt, Timothy P. Lillicrap, David Silver

Partially observed control problems are a challenging aspect of reinforcement learning.

Continuous Control

977

Paper
Code

Learning Continuous Control Policies by Stochastic Value Gradients

3 code implementations • NeurIPS 2015 • Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez

One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

Continuous Control

2,505

Paper
Code

Continuous control with deep reinforcement learning

157 code implementations • 9 Sep 2015 • Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

Ranked #3 on Continuous Control on Lunar Lander (OpenAI Gym)

Action Detection Continuous Control +3

30,980

Paper
Code

Gradient Estimation Using Stochastic Computation Graphs

1 code implementation • NeurIPS 2015 • John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world.

Paper
Code

Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages

1 code implementation • 9 Mar 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess, S. M. Ali Eslami, Balaji Lakshminarayanan, Dino Sejdinovic, Zoltán Szabó

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output.

regression

Paper
Code

Passing Expectation Propagation Messages with Kernel Methods

no code implementations • 2 Jan 2015 • Wittawat Jitkrittum, Arthur Gretton, Nicolas Heess

We propose to learn a kernel-based message operator which takes as input all expectation propagation (EP) incoming messages to a factor node and produces an outgoing message.

Paper
Add Code

Bayes-Adaptive Simulation-based Search with Value Function Approximation

no code implementations • NeurIPS 2014 • Arthur Guez, Nicolas Heess, David Silver, Peter Dayan

Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty.

Paper
Add Code

Recurrent Models of Visual Attention

19 code implementations • NeurIPS 2014 • Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels.

Hard Attention Image Classification +1

466

Paper
Code

Deterministic Policy Gradient Algorithms

1 code implementation • International Conference on Machine Learning 2014 • David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Learning to Pass Expectation Propagation Messages

no code implementations • NeurIPS 2013 • Nicolas Heess, Daniel Tarlow, John Winn

Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods.

Paper
Add Code

Searching for objects driven by context

no code implementations • NeurIPS 2012 • Bogdan Alexe, Nicolas Heess, Yee W. Teh, Vittorio Ferrari

The dominant visual search paradigm for object class detection is sliding windows.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.