We characterize the feasible set by the established self-consistency condition, then a safety value function can be learned and used as constraints in CRL.
Both the primal and dual estimators are learned from data using supervised learning techniques, and the explicit sample size is provided, which enables us to guarantee the quality of each learned estimator in terms of feasibility and optimality.
Zeroth-order optimization methods and policy gradient based first-order methods are two promising alternatives to solve reinforcement learning (RL) problems with complementary advantages.
Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger.
This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL.
This paper develops the dynamic permutation state representation in the framework of integrated decision and control (IDC) to handle signalized intersections with mixed traffic flows.
Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians.
Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller.
In this paper, we present an interpretable and computationally efficient framework called integrated decision and control (IDC) for automated vehicles, which decomposes the driving task into static path planning and dynamic optimal tracking that are structured hierarchically.
The equivalence holds given certain conditions about initial state distributions and policy formats, in which the system state is the estimation error, control input is the filter gain, and control objective function is the accumulated estimation error.
In this paper, we propose an RL-based end-to-end decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (SDSAC).
Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving.
This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.
MPG contains two types of PG: 1) data-driven PG, which is obtained by directly calculating the derivative of the learned Q-value function with respect to actions, and 2) model-driven PG, which is calculated using BPTT based on the model-predictive return.
This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems.
Taking a control perspective, we first interpret the penalty method and the Lagrangian method as proportional feedback and integral feedback control, respectively.
The simulation results show that the TPI algorithm can converge to the optimal solution for the linear plant, and has high resistance to disturbances for the nonlinear plant.
The simulations of two scenarios for autonomous vehicles confirm we can ensure safety while achieving fast learning.
This paper presents a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy with the purpose of improving both learning accuracy and training speed.
In this paper, we introduce the minimax formulation and distributional framework to improve the generalization ability of RL algorithms and develop the Minimax Distributional Soft Actor-Critic (Minimax DSAC) algorithm.
A sequential latent environment model is introduced and learned jointly with the reinforcement learning process.
In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance.
Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks.
CADP linearizes the constrained optimization problem locally into a quadratically constrained linear programming problem, and then obtains the optimal update of the policy network by solving its dual problem.
This paper proposes the Deep Generalized Policy Iteration (DGPI) algorithm to find the infinite horizon optimal control policy for general nonlinear continuous-time systems with known dynamics.
As autonomous vehicles (AVs) need to interact with other road users, it is of importance to comprehensively understand the dynamic traffic environment, especially the future possible trajectories of surrounding vehicles.
This paper presents a distributed model predictive control (DMPC) algorithm for heterogeneous vehicle platoons with unidirectional topologies and a p r i o r i unknown desired set point.