no code implementations • 14 Apr 2024 • Simon Eisenmann, Daniel Hein, Steffen Udluft, Thomas A. Runkler
The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function.
no code implementations • 11 Aug 2023 • Marc Weber, Phillip Swazinna, Daniel Hein, Steffen Udluft, Volkmar Sterzing
Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available.
no code implementations • 16 Jun 2023 • Phillip Swazinna, Steffen Udluft, Thomas Runkler
Recently, offline RL algorithms have been proposed that remain adaptive at runtime.
1 code implementation • 1 Aug 2022 • Philipp Scholl, Felix Dietrich, Clemens Otte, Steffen Udluft
Based on this finding, we develop adaptations, the Adv-Soft-SPIBB algorithms, and show that they are provably safe.
no code implementations • 9 Jun 2022 • Simon Wiedemann, Daniel Hein, Steffen Udluft, Christian Mendl
We present a full implementation and simulation of a novel quantum reinforcement learning method.
1 code implementation • 21 May 2022 • Phillip Swazinna, Steffen Udluft, Thomas Runkler
At the same time, offline RL algorithms are not able to tune their most important hyperparameter - the proximity of the learned policy to the original policy.
1 code implementation • 28 Jan 2022 • Philipp Scholl, Felix Dietrich, Clemens Otte, Steffen Udluft
Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy.
1 code implementation • 14 Jan 2022 • Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler
Offline reinforcement learning (RL) Algorithms are often designed with environments such as MuJoCo in mind, in which the planning horizon is extremely long and no noise exists.
no code implementations • 26 Nov 2021 • Phillip Swazinna, Steffen Udluft, Thomas Runkler
Recently developed offline reinforcement learning algorithms have made it possible to learn policies directly from pre-collected datasets, giving rise to a new dilemma for practitioners: Since the performance the algorithms are able to deliver depends greatly on the dataset that is presented to them, practitioners need to pick the right dataset among the available ones.
1 code implementation • 12 Jul 2021 • Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler
In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset.
no code implementations • 12 Aug 2020 • Phillip Swazinna, Steffen Udluft, Thomas Runkler
State-of-the-art reinforcement learning algorithms mostly rely on being allowed to directly interact with their environment to collect millions of observations.
no code implementations • 29 Apr 2018 • Daniel Hein, Steffen Udluft, Thomas A. Runkler
Autonomously training interpretable control strategies, called policies, using pre-existing plant trajectory data is of great interest in industrial applications.
no code implementations • 12 Dec 2017 • Daniel Hein, Steffen Udluft, Thomas A. Runkler
Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples.
no code implementations • 10 Dec 2017 • Stefan Depeweg, José Miguel Hernández-Lobato, Steffen Udluft, Thomas Runkler
We derive a novel sensitivity analysis of input variables for predictive epistemic and aleatoric uncertainty.
1 code implementation • ICML 2018 • Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft
Bayesian neural networks with latent variables are scalable and flexible probabilistic models: They account for uncertainty in the estimation of the network weights and, by making use of latent variables, can capture complex noise patterns in the data.
2 code implementations • 27 Sep 2017 • Daniel Hein, Stefan Depeweg, Michel Tokic, Steffen Udluft, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing
On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand.
no code implementations • 26 Jun 2017 • Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft
Bayesian neural networks (BNNs) with latent variables are probabilistic models which can automatically identify complex stochastic patterns in the data.
no code implementations • 20 May 2017 • Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing
The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting.
no code implementations • 19 Oct 2016 • Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft
To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL.
no code implementations • 12 Oct 2016 • Daniel Hein, Alexander Hentschel, Volkmar Sterzing, Michel Tokic, Steffen Udluft
A novel reinforcement learning benchmark, called Industrial Benchmark, is introduced.
2 code implementations • 23 May 2016 • Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft
We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning.
Model-based Reinforcement Learning reinforcement-learning +2