The generated robot motions are further adapted with Inverse Kinematics to ensure the desired physical proximity with a human, combining the ease of joint space learning and accurate task space reachability.
Multi-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems.
The advent of tactile sensors in robotics has sparked many ideas on how robots can leverage direct contact measurements of their environment interactions to improve manipulation tasks.
Stateful policies play an important role in reinforcement learning, such as handling partially observable environments, enhancing robustness, or imposing an inductive bias directly into the policy structure.
Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents.
Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL).
To this end, Robust Adversarial Reinforcement Learning (RARL) trains a protagonist against destabilizing forces exercised by an adversary in a competitive zero-sum Markov game, whose optimal solution, i. e., rational strategy, corresponds to a Nash equilibrium.
In this work, we focus on framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL.
Reinforcement Learning (RL) allows learning non-trivial robot control laws purely from data.
We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process.
Learning priors on trajectory distributions can help accelerate robot motion planning optimization.
Bayesian deep learning approaches assume model parameters to be latent random variables and infer posterior distributions to quantify uncertainty, increase safety and trust, and prevent overconfident and unpredictable behavior.
Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents.
We propose a model predictive control approach for autonomous vehicles that exploits learned Gaussian processes for predicting human driving behavior.
Therefore, we conclude that the limitation of model-based value expansion methods is not the model accuracy of the learned models.
Recent methods for imitation learning directly learn a $Q$-function using an implicit reward formulation rather than an explicit reward function.
Deployment of Reinforcement Learning (RL) algorithms for robotics applications in the real world requires ensuring the safety of the robot and its environment.
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
Motion planning is a mature area of research in robotics with many well-established methods based on optimization or sampling the state space, suitable for solving kinematic motion planning.
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint.
On the one hand, we found that PAC-Bayes bounds are a useful tool for designing offline bandit algorithms with performance guarantees.
On the other hand, the proposed Decision LSTM is able to achieve expert-level performance on these tasks, in addition to learning a swing-up controller on the real system.
We derive two efficient variational inference techniques to learn these representations and highlight the advantages of hierarchical infinite local regression models, such as dealing with non-smooth functions, mitigating catastrophic forgetting, and enabling parameter sharing and fast predictions.
Robotic manipulation stands as a largely unsolved problem despite significant advances in robotics and machine learning in recent years.
Modeling interaction dynamics to generate robot trajectories that enable a robot to adapt and react to a human's actions and intentions is critical for efficient and effective collaborative Human-Robot Interactions (HRI).
To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method.
Monte Carlo methods have become increasingly relevant for control of non-differentiable systems, approximate dynamics models and learning from data.
Our proposed approach achieves state-of-the-art performance in simulated high-dimensional and dynamic tasks while avoiding collisions with the environment.
Recent works based on state-visitation counts, curiosity and entropy-maximization generate intrinsic reward signals to motivate the agent to visit novel states for exploration.
Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences.
In this work, we focus on learning SE(3) diffusion models for 6DoF grasping, giving rise to a novel framework for joint grasp and motion optimization without needing to decouple grasp selection from trajectory generation.
In this paper, we focus on the problem of integrating Energy-based Models (EBM) as guiding priors for motion optimization.
Model-based value expansion methods promise to improve the quality of value function targets and, thereby, the effectiveness of value function learning.
Task and Motion Planning (TAMP) provides a hierarchical framework to handle the sequential nature of manipulation tasks by interleaving a symbolic task planner that generates a possible action sequence, with a motion planner that checks the kinematic feasibility in the geometric world, generating robot trajectories if several constraints are satisfied, e. g., a collision-free trajectory from one state to another.
Autonomous robots should operate in real-world dynamic environments and collaborate with humans in tight spaces.
Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level.
Robot assembly discovery is a challenging problem that lives at the intersection of resource allocation and motion planning.
This work contributes to a better understanding and modeling of the human driver, aiming to expedite simulation methods in the modern vehicle development process and potentially supporting automated driving and racing technologies.
Recent methods for reinforcement learning from images use auxiliary tasks to learn image features that are used by the agent's policy or Q-function.
In this work, we propose two methods for improving the convergence rate and exploration based on a newly introduced backup operator and entropy regularization.
However, these methods are notorious for the enormous amount of required training data which is prohibitively expensive to collect on real robots.
The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data.
Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task.
Especially for learning dynamics models, these black-box models are not desirable as the underlying principles are well understood and the standard deep networks can learn dynamics that violate these principles.
Bayesian deep learning approaches assume model parameters to be latent random variables and infer posterior predictive distributions to quantify uncertainty, increase safety and trust, and prevent overconfident and unpredictable behavior.
This approach, which we refer to as boosted curriculum reinforcement learning (BCRL), has the benefit of naturally increasing the representativeness of the functional space by adding a new residual each time a new task is presented.
Experiments demonstrate that the resulting introduction of metric structure into the curriculum allows for a well-behaving non-parametric version of SPRL that leads to stable learning performance across tasks.
This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators.
We show that while such an agent is still novelty seeking, i. e. interested in exploring the whole state space, it focuses on exploration where its perceived influence is greater, avoiding areas of greater stochasticity or traps that limit its control.
2 code implementations • 7 Jun 2021 • Antoine Grosnit, Rasul Tutunov, Alexandre Max Maraval, Ryan-Rhys Griffiths, Alexander I. Cowen-Rivers, Lin Yang, Lin Zhu, Wenlong Lyu, Zhitang Chen, Jun Wang, Jan Peters, Haitham Bou-Ammar
We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces.
Ranked #1 on Molecular Graph Generation on ZINC
The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics.
A key feature of intelligent behaviour is the ability to learn abstract strategies that scale and transfer to unfamiliar problems.
Reactive motion generation problems are usually solved by computing actions as a sum of policies.
Due to recent breakthroughs, reinforcement learning (RL) has demonstrated impressive performance in challenging sequential decision-making problems.
Trajectory optimization and model predictive control are essential techniques underpinning advanced robotic applications, ranging from autonomous driving to full-body humanoid control.
Substantial advancements to model-based reinforcement learning algorithms have been impeded by the model-bias induced by the collected data, which generally hurts performance.
Discrete-time stochastic optimal control remains a challenging problem for general, nonlinear systems under significant uncertainty, with practical solvers typically relying on the certainty equivalence assumption, replanning and/or extensive regularization.
Moreover, we effectively combine this skeleton space with the resultant motion variable spaces into a single extended decision space.
Across machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives.
no code implementations • 7 Dec 2020 • Sebastian Höfer, Kostas Bekris, Ankur Handa, Juan Camilo Gamboa, Florian Golemo, Melissa Mozifian, Chris Atkeson, Dieter Fox, Ken Goldberg, John Leonard, C. Karen Liu, Jan Peters, Shuran Song, Peter Welinder, Martha White
This report presents the debates, posters, and discussions of the Sim2Real workshop held in conjunction with the 2020 edition of the "Robotics: Science and System" conference.
3 code implementations • 7 Dec 2020 • Alexander I. Cowen-Rivers, Wenlong Lyu, Rasul Tutunov, Zhi Wang, Antoine Grosnit, Ryan Rhys Griffiths, Alexandre Max Maraval, Hao Jianye, Jun Wang, Jan Peters, Haitham Bou Ammar
Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers.
Ranked #1 on Hyperparameter Optimization on Bayesmark
Neural linear models (NLM) and Gaussian processes (GP) are both examples of Bayesian linear regression on rich feature spaces.
We then propose an optimization algorithm that follows the gradient of the composition of the objective and the projection and prove its convergence for linear objectives and arbitrary convex and Lipschitz domain defining inequality constraints.
Probabilistic regression techniques in control and robotics applications have to fulfill different criteria of data-driven adaptability, computational efficiency, scalability to high dimensions, and the capacity to deal with different modalities in the data.
A limitation of model-based reinforcement learning (MBRL) is the exploitation of errors in the learned models.
Off-policy Reinforcement Learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment.
Robots that can learn in the physical world will be important to en-able robots to escape their stiff and pre-programmed movements.
Parameterized movement primitives have been extensively used for imitation learning of robotic tasks.
We introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, stochastic, nonlinear dynamics.
Active inference (AI) is a persuasive theoretical framework from computational neuroscience that seeks to describe action and perception as inference-based computation.
One approach that was recently used to autonomously generate a repertoire of diverse skills is a novelty based Quality-Diversity~(QD) algorithm.
Our deterministic approximation of the transition kernel is applicable to both training and prediction.
This work is the first to (a) fail-safe learn of a safety-critical dynamic task using anthropomorphic robot arms, (b) learn a precision-demanding problem with a PAM-driven system despite the control challenges and (c) train robots to play table tennis without real balls.
Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators.
Inherent morphological characteristics in objects may offer a wide range of plausible grasping orientations that obfuscates the visual learning of robotic grasping.
The control of nonlinear dynamical systems remains a major challenge for autonomous agents.
We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning.
Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning.
In this regard, Weighted Q-Learning (WQL) effectively reduces bias and shows remarkable results in stochastic environments.
Learning to control robots without requiring engineered models has been a long-term goal, promising diverse and novel applications.
The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states.
Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap.
Therefore, differential equations are a promising approach to incorporate prior knowledge in machine learning models to obtain robust and interpretable models.
The empirical analysis shows that the dimensionality reduction in parameter space is more effective than in configuration space, as it enables the representation of the movements with a significant reduction of parameters.
The development of autonomous robotic systems that can learn from human demonstrations to imitate a desired behavior - rather than being manually programmed - has huge technological potential.
The Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity.
To approach this problem, we propose Probabilistic Modeling of Driver behavior (ProMoD), a modular framework which splits the task of driver behavior modeling into multiple modules.
Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes.
MushroomRL is an open-source Python library developed to simplify the process of implementing and running Reinforcement Learning (RL) experiments.
Empirical results on classic and novel benchmarks show that the proposed approach outperforms existing methods in environments with sparse rewards, especially in the presence of rewards that create suboptimal modes of the objective function.
In order to reach similar performance, we developed a hierarchical Bayesian optimization algorithm that replicates the cognitive inference and memorization process for avoiding failures in motor control tasks.
Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w. r. t.
A key feature of intelligent behavior is the ability to learn abstract strategies that transfer to unfamiliar problems.
Sample-efficient exploration is crucial not only for discovering rewarding experiences but also for adapting to environment changes in a task-agnostic fashion.
Generalization and adaptation of learned skills to novel situations is a core requirement for intelligent autonomous robots.
A key feature of intelligent behavior is the ability to learn abstract strategies that transfer to unfamiliar problems.
The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning.
Our method uses encoder and decoder deep networks that maps complete or partial trajectories to a Gaussian distributed latent space and back, allowing for fast inference of the future values of a trajectory given previous observations.
Model-based Reinforcement Learning (MBRL) allows data-efficient learning which is required in real world applications such as robotics.
Accordingly, for learning a new task, time could be saved by restricting the parameter search space by initializing it with the solution of a similar task.
DeLaN can learn the equations of motion of a mechanical system (i. e., system dynamics) with a deep network efficiently while ensuring physical plausibility.
Optimizing a policy on a slightly faulty simulator can easily lead to the maximization of the `Simulation Optimization Bias` (SOB).
Applying Deep Learning to control has a lot of potential for enabling the intelligent design of robot control laws.
The approach is evaluated in a collaborative human-robot interaction task with a 7-DoF robot arm.
System identification of complex and nonlinear systems is a central problem for model predictive control and model-based reinforcement learning.
In quadruped gait learning, policy search methods that scale high dimensional continuous action spaces are commonly used.
Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest without the ability to communicate.
Sample efficiency is a crucial problem in deep reinforcement learning.
Online detection of instantaneous changes in the generative process of a data sequence generally focuses on retrospective inference of such change points without considering their future occurrences.
Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization.
Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability.
This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning.
However, to be able to capture variability and correlations between different joints, a probabilistic movement primitive requires the estimation of a larger number of parameters compared to their deterministic counterparts, that focus on modeling only the mean behavior.
Using movement primitive libraries is an effective means to enable robots to solve more complex tasks.
Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention.
By using learning signals which mimic the intrinsic motivation signalcognitive dissonance in addition with a mental replay strategy to intensify experiences, the stochastic recurrent network can learn from few physical interactions and adapts to novel environments in seconds.
We carry out asymptotic analysis of the solutions for different values of $\alpha$ and demonstrate the effects of using different divergence functions on a multi-armed bandit problem and on common standard reinforcement learning problems.
Bayesian optimization is renowned for its sample efficiency but its application to higher dimensional tasks is impeded by its focus on global optimization.
A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored.
In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations.
This feature space is often learned in an unsupervised way, which might lead to data representations that are not useful for the overall regression task.
Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics.