In this work, we extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations.
Reinforcement learning (RL) for continuous control typically employs distributions whose support covers the entire action space.
Curiosity-based reward schemes can present powerful exploration mechanisms which facilitate the discovery of solutions for complex, sparse or long-horizon tasks.
1 code implementation • 25 May 2021 • SiQi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess
In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds.
no code implementations • 3 Nov 2020 • Markus Wulfmeier, Arunkumar Byravan, Tim Hertweck, Irina Higgins, Ankush Gupta, tejas kulkarni, Malcolm Reynolds, Denis Teplyashin, Roland Hafner, Thomas Lampe, Martin Riedmiller
Furthermore, the value of each representation is evaluated in terms of three properties: dimensionality, observability and disentanglement.
Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs.
no code implementations • 30 Jul 2020 • Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller
We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm.
In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition.
no code implementations • 2 Jan 2020 • Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller
In contrast, we propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously.
This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer.
Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space).
Applying reinforcement learning (RL) to physical systems presents notable challenges, given requirements regarding sample efficiency, safety, and physical constraints compared to simulated environments.
no code implementations • 26 Jun 2019 • Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller
The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements.
Using an adversarial stethoscope, the network is successfully de-biased, leading to a performance increase from 66% to 88%.
Recent successes in machine learning have led to a shift in the design of autonomous systems, improving performance on existing tasks and rendering new applications possible.
Due to recent advances - compute, data, models - the role of learning in autonomous systems has expanded significantly, rendering new applications possible for the first time.
Conversely, training on an easy dataset where visual cues are positively correlated with stability, the baseline model learns a bias leading to poor performance on a harder dataset.
Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks.
Continuous appearance shifts such as changes in weather and lighting conditions can impact the performance of deployed machine learning models.
The robot is trained in reverse, gradually learning to reach the goal from a set of start states increasingly far from the goal.
Appearance changes due to weather and seasonal conditions represent a strong impediment to the robust implementation of machine learning systems in outdoor robotics.
Recent advances have shown the capability of Fully Convolutional Neural Networks (FCN) to model cost functions for motion planning in the context of learning driving preferences purely based on demonstration data from human drivers.
In this work, we present an approach to learn cost maps for driving in complex urban environments from a very large number of demonstrations of driving behaviour by human experts.
This paper presents a general framework for exploiting the representational capacity of neural networks to approximate complex, nonlinear reward functions in the context of solving the inverse reinforcement learning (IRL) problem.