Recent studies have focused on constructing Instruction Fine-Tuning (IFT) data through medical knowledge graphs to enrich the interactive medical knowledge of LLMs.
Large language models (LLMs) encode a vast amount of semantic knowledge and possess remarkable understanding and reasoning capabilities.
Experiments on quadruped and humanoid robots demonstrate that the learned policy is robust against local motor malfunctions and can be transferred to new tasks.
With strong capabilities of reasoning and a generic understanding of the world, Large Language Models (LLMs) have shown great potential in building versatile embodied decision making agents capable of performing diverse tasks.
To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms.
In this paper, we propose a new framework for learning robust, agile and natural legged locomotion skills over challenging terrain.
One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Furthermore, we build a safe RL framework to resolve constraints required by the DRC and its corresponding shield policy.
This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error.
End-to-end autonomous driving provides a feasible way to automatically maximize overall driving system performance by directly mapping the raw pixels from a front-facing camera to control signals.
Experimental results demonstrate that our proposed method achieves policy generalization to unseen compositional tasks in a zero-shot manner.
Choosing an appropriate parameter set for the designed controller is critical for the final performance but usually requires a tedious and careful tuning process, which implies a strong need for automatic tuning methods.
In this paper, we propose a contact-safe reinforcement learning framework for contact-rich robot manipulation, which maintains safety in both the task space and joint space.
In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size.
Furthermore, we show that the learned belief states can be plugged into downstream RL algorithms to improve performance.
Recent studies incorporate feasible sets into CRL with energy-based methods such as control barrier function (CBF), safety index (SI), and leverage prior conservative estimations of feasible sets, which harms the performance of the learned policy.
Further, we overcome these challenges by introducing a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
The recent advanced evolution-based zeroth-order optimization methods and the policy gradient-based first-order methods are two promising alternatives to solve reinforcement learning (RL) problems with complementary advantages.
Model-based reinforcement learning aims to improve the sample efficiency of policy learning by modeling the dynamics of the environment.
Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger.
This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL.
Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller.
The safety constraints commonly used by existing safe reinforcement learning (RL) methods are defined only on expectation of initial states, but allow each certain state to be unsafe, which is unsatisfying for real-world safety-critical tasks.
With a pair of pre- and post-disaster satellite images, building damage assessment aims at predicting the extent of damage to buildings.
Ranked #1 on 2D Semantic Segmentation on xBD
Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving.
MPG contains two types of PG: 1) data-driven PG, which is obtained by directly calculating the derivative of the learned Q-value function with respect to actions, and 2) model-driven PG, which is calculated using BPTT based on the model-predictive return.
Taking a control perspective, we first interpret the penalty method and the Lagrangian method as proportional feedback and integral feedback control, respectively.
To address this challenge, we propose a hierarchical behavior planning framework with a set of low-level safe controllers and a high-level reinforcement learning algorithm (H-CtRL) as a coordinator for the low-level controllers.
Safety is essential for reinforcement learning (RL) applied in real-world situations.
Therefore, to incorporate the long-range contextual information, a deep fully convolutional network (FCN) with an efficient non-local module, named ENL-FCN, is proposed for HSI classification.
Current autonomous driving systems are composed of a perception system and a decision system.
A sequential latent environment model is introduced and learned jointly with the reinforcement learning process.
Current methods for long-term trajectory prediction cannot guarantee the physical feasibility of predicted distribution.
We present the design and implementation of a visual search system for real time image retrieval on JD. com, the world's third largest and China's largest e-commerce site.
As autonomous vehicles (AVs) need to interact with other road users, it is of importance to comprehensively understand the dynamic traffic environment, especially the future possible trajectories of surrounding vehicles.
Urban autonomous driving decision making is challenging due to complex road geometry and multi-agent interactions.