Model-based reinforcement learning aims to improve the sample efficiency of policy learning by modeling the dynamics of the environment.
Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger.
This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL.
Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller.
The safety constraints commonly used by existing safe reinforcement learning (RL) methods are defined only on expectation of initial states, but allow each certain state to be unsafe, which is unsatisfying for real-world safety-critical tasks.
With a pair of pre- and post-disaster satellite images, building damage assessment aims at predicting the extent of damage to buildings.
Ranked #1 on 2D Semantic Segmentation on xBD
Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving.
MPG contains two types of PG: 1) data-driven PG, which is obtained by directly calculating the derivative of the learned Q-value function with respect to actions, and 2) model-driven PG, which is calculated using BPTT based on the model-predictive return.
Taking a control perspective, we first interpret the penalty method and the Lagrangian method as proportional feedback and integral feedback control, respectively.
To address this challenge, we propose a hierarchical behavior planning framework with a set of low-level safe controllers and a high-level reinforcement learning algorithm (H-CtRL) as a coordinator for the low-level controllers.
Therefore, to incorporate the long-range contextual information, a deep fully convolutional network (FCN) with an efficient non-local module, named ENL-FCN, is proposed for HSI classification.
Current autonomous driving systems are composed of a perception system and a decision system.
A sequential latent environment model is introduced and learned jointly with the reinforcement learning process.
Current methods for long-term trajectory prediction cannot guarantee the physical feasibility of predicted distribution.
We present the design and implementation of a visual search system for real time image retrieval on JD. com, the world's third largest and China's largest e-commerce site.
As autonomous vehicles (AVs) need to interact with other road users, it is of importance to comprehensively understand the dynamic traffic environment, especially the future possible trajectories of surrounding vehicles.
Urban autonomous driving decision making is challenging due to complex road geometry and multi-agent interactions.