Search Results for author: Martin Riedmiller

Found 63 papers, 20 papers with code

Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning

no code implementations14 Dec 2023 Martin Riedmiller, Tim Hertweck, Roland Hafner

While we agree on the power of scaling - in the sense of Sutton's 'bitter lesson' - we will give some evidence, that considering structure and adding design principles can be a valuable and critical component in particular when data is not abundant and infinite, but is a precious resource.

Decision Making

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

no code implementations14 Sep 2023 Cristina Pinneri, Sarah Bechtle, Markus Wulfmeier, Arunkumar Byravan, Jingwei Zhang, William F. Whitney, Martin Riedmiller

We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment.

Data Augmentation Offline RL +2

Policy composition in reinforcement learning via multi-objective policy optimization

no code implementations29 Aug 2023 Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup

In two domains with continuous observation and action spaces, our agents successfully compose teacher policies in sequence and in parallel, and are also able to further extend the policies of the teachers in order to solve the task.


Towards A Unified Agent with Foundation Models

no code implementations18 Jul 2023 Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, Martin Riedmiller

Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others.

Efficient Exploration Reinforcement Learning (RL) +2

Leveraging Jumpy Models for Planning and Fast Learning in Robotic Domains

no code implementations24 Feb 2023 Jingwei Zhang, Jost Tobias Springenberg, Arunkumar Byravan, Leonard Hasenclever, Abbas Abdolmaleki, Dushyant Rao, Nicolas Heess, Martin Riedmiller

We conduct a set of experiments in the RGB-stacking environment, showing that planning with the learned skills and the associated model can enable zero-shot generalization to new tasks, and can further speed up training of policies via reinforcement learning.

reinforcement-learning Reinforcement Learning (RL) +1

Solving Continuous Control via Q-learning

1 code implementation22 Oct 2022 Tim Seyde, Peter Werner, Wilko Schwarting, Igor Gilitschenski, Martin Riedmiller, Daniela Rus, Markus Wulfmeier

While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces.

Continuous Control Multi-agent Reinforcement Learning +1

MO2: Model-Based Offline Options

no code implementations5 Sep 2022 Sasha Salter, Markus Wulfmeier, Dhruva Tirumala, Nicolas Heess, Martin Riedmiller, Raia Hadsell, Dushyant Rao

The ability to discover useful behaviours from past experience and transfer them to new tasks is considered a core component of natural embodied intelligence.

Continuous Control

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

1 code implementation21 Apr 2022 Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, SiQi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks.

Continuous Control reinforcement-learning +1

The Challenges of Exploration for Offline Reinforcement Learning

no code implementations27 Jan 2022 Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour.

Model Predictive Control Offline RL +2

Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

no code implementations17 Sep 2021 Oliver Groth, Markus Wulfmeier, Giulia Vezzani, Vibhavari Dasagi, Tim Hertweck, Roland Hafner, Nicolas Heess, Martin Riedmiller

Curiosity-based reward schemes can present powerful exploration mechanisms which facilitate the discovery of solutions for complex, sparse or long-horizon tasks.

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

no code implementations15 Jun 2021 Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.

Offline RL reinforcement-learning +1

Explicit Pareto Front Optimization for Constrained Reinforcement Learning

no code implementations1 Jan 2021 Sandy Huang, Abbas Abdolmaleki, Philemon Brakel, Steven Bohez, Nicolas Heess, Martin Riedmiller, Raia Hadsell

We propose a framework that uses a multi-objective RL algorithm to find a Pareto front of policies that trades off between the reward and constraint(s), and simultaneously searches along this front for constraint-satisfying policies.

Continuous Control reinforcement-learning +1

Local Search for Policy Iteration in Continuous Control

no code implementations12 Oct 2020 Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov, Martin Riedmiller

We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial.

Continuous Control Reinforcement Learning (RL)

Simple Sensor Intentions for Exploration

no code implementations15 May 2020 Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition.

Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics

no code implementations2 Jan 2020 Michael Neunert, Abbas Abdolmaleki, Markus Wulfmeier, Thomas Lampe, Jost Tobias Springenberg, Roland Hafner, Francesco Romano, Jonas Buchli, Nicolas Heess, Martin Riedmiller

In contrast, we propose to treat hybrid problems in their 'native' form by solving them with hybrid reinforcement learning, which optimizes for discrete and continuous actions simultaneously.

reinforcement-learning Reinforcement Learning (RL)

Quinoa: a Q-function You Infer Normalized Over Actions

no code implementations5 Nov 2019 Jonas Degrave, Abbas Abdolmaleki, Jost Tobias Springenberg, Nicolas Heess, Martin Riedmiller

We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form.

Normalising Flows reinforcement-learning +1

Robust Reinforcement Learning for Continuous Control with Model Misspecification

no code implementations ICLR 2020 Daniel J. Mankowitz, Nir Levine, Rae Jeong, Yuanyuan Shi, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy Mann, Todd Hester, Martin Riedmiller

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms.

Continuous Control reinforcement-learning +1

Relative Entropy Regularized Policy Iteration

1 code implementation5 Dec 2018 Abbas Abdolmaleki, Jost Tobias Springenberg, Jonas Degrave, Steven Bohez, Yuval Tassa, Dan Belov, Nicolas Heess, Martin Riedmiller

Our algorithm draws on connections to existing literature on black-box optimization and 'RL as an inference' and it can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) [Abdolmaleki et al., 2018a], or as an extension of Trust Region Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) [Abdolmaleki et al., 2017b; Hansen et al., 1997] to a policy iteration scheme.

Continuous Control OpenAI Gym +1

Maximum a Posteriori Policy Optimisation

3 code implementations ICLR 2018 Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective.

Continuous Control reinforcement-learning +1

Graph networks as learnable physics engines for inference and control

1 code implementation ICML 2018 Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia

Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model.

Inductive Bias

DeepMind Control Suite

8 code implementations2 Jan 2018 Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, Martin Riedmiller

The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents.

Continuous Control reinforcement-learning +1

PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

no code implementations27 May 2017 Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller

We propose position-velocity encoders (PVEs) which learn---without supervision---to encode images to positions and velocities of task-relevant objects.

Image Reconstruction Position

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

1 code implementation NeurIPS 2015 Manuel Watter, Jost Tobias Springenberg, Joschka Boedecker, Martin Riedmiller

We introduce Embed to Control (E2C), a method for model learning and control of non-linear dynamical systems from raw pixel images.

Human level control through deep reinforcement learning

7 code implementations25 Feb 2015 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg1 & Demis Hassabis

We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.

Atari Games reinforcement-learning +1

Striving for Simplicity: The All Convolutional Net

37 code implementations21 Dec 2014 Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller

Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers.

Image Classification Object +1

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

1 code implementation26 Jun 2014 Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox

While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

General Classification Geometric Matching

Playing Atari with Deep Reinforcement Learning

111 code implementations19 Dec 2013 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.

Atari Games Q-Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.