Safe Exploration

35 papers with code • 0 benchmarks • 0 datasets

Safe Exploration is an approach to collect ground truth data by safely interacting with the environment.

Source: Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems

Libraries

Use these libraries to find Safe Exploration models and implementations
2 papers
53

Most implemented papers

Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach

62subinh/deeprl_safety_specification 24 Feb 2020

Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.

Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent

ejcatt/aixijs_mentee 5 Jun 2020

Much work in reinforcement learning uses an ergodicity assumption to avoid this problem.

Enforcing Almost-Sure Reachability in POMDPs

sjunges/shielding-POMDPs 30 Jun 2020

Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information.

Verifiably Safe Exploration for End-to-End Reinforcement Learning

IBM/vsrl-framework 2 Jul 2020

We also prove that our method of enforcing the safety constraints preserves all safe policies from the original environment.

Provably Safe PAC-MDP Exploration Using Analogies

locuslab/ase 7 Jul 2020

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure).

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

gavlegoat/safe-learning NeurIPS 2020

We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces.

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

ml-jku/align-rudder 29 Sep 2020

For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks.

Autonomous UAV Exploration of Dynamic Environments via Incremental Sampling and Probabilistic Roadmap

Zhefan-Xu/DEP 14 Oct 2020

Autonomous exploration requires robots to generate informative trajectories iteratively.

Safe Continuous Control with Constrained Model-Based Policy Optimization

anyboby/mujoco_safety_gym 14 Apr 2021

Further, we provide theoretical and empirical analyses regarding the implications of model-usage on constrained policy optimization problems and introduce a practical algorithm that accelerates policy search with model-generated data.

Infinite Time Horizon Safety of Bayesian Neural Networks

mlech26l/bayesian_nn_safety NeurIPS 2021

Bayesian neural networks (BNNs) place distributions over the weights of a neural network to model uncertainty in the data and the network's prediction.