In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy that prevents the robot from entering unsafe states, and a learner policy that is optimized to complete the task.
One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases.
Recent advances in deep reinforcement learning and scalable photorealistic simulation have led to increasingly mature embodied AI for various visual tasks, including navigation.
Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics.
An EAP takes as input the predicted future state error in the target environment, which is provided by an error-prediction function, simultaneously trained with the EAP.
Therefore, learning a navigation policy for a new robot with a new sensor configuration or a new target still remains a challenging problem.
Recent advances in deep reinforcement learning (deep RL) enable researchers to solve challenging control problems, from simulated environments to real-world robotic tasks.
Safety is an essential component for deploying reinforcement learning (RL) algorithms in real-world scenarios, and is critical during the learning process itself.
In this paper, we develop a system for learning legged locomotion policies with deep RL in the real world with minimal human effort.
The key idea behind MSO is to expose the same adaptation process, Strategy Optimization (SO), to both the training and testing phases.
Here, we propose a zero-shot imitation learning approach for training a visual navigation policy on legged robots from human (third-person perspective) demonstrations, enabling high-quality navigation and cost-effective data collection.
Using our method, we train a robotic arm to estimate the mass distribution of an object with moving parts (e. g. an articulated rigid body system) by pushing it on a surface with unknown friction properties.
In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies.
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
With this mixture of actor-critic architecture, the discrete contact sequence planning is solved through the selection of the best critics while the continuous control problem is solved by the optimization of actors.