1 code implementation • 16 Jan 2024 • Zhepeng Cen, Zuxin Liu, Zitong Wang, Yihang Yao, Henry Lam, Ding Zhao
Offline reinforcement learning (RL) offers a promising direction for learning policies from pre-collected datasets without requiring further interactions with the environment.
no code implementations • 23 Dec 2023 • Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao Yu, Ding Zhao
Leveraging insights from this framework and recognizing the significance of \textit{redundant} and \textit{conflicting} constraint conditions, we introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction.
3 code implementations • 15 Jun 2023 • Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
This paper presents a comprehensive benchmarking suite tailored to offline safe reinforcement learning (RL) challenges, aiming to foster progress in the development and evaluation of safe learning algorithms in both the training and deployment phases.
1 code implementation • 14 Feb 2023 • Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao
Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment.
no code implementations • 16 Sep 2022 • Mengdi Xu, Zuxin Liu, Peide Huang, Wenhao Ding, Zhepeng Cen, Bo Li, Ding Zhao
A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments.
1 code implementation • 29 May 2022 • Zuxin Liu, Zijian Guo, Zhepeng Cen, huan zhang, Jie Tan, Bo Li, Ding Zhao
One interesting and counter-intuitive finding is that the maximum reward attack is strong, as it can both induce unsafe behaviors and make the attack stealthy by maintaining the reward.
no code implementations • 4 Apr 2022 • Mansur Arief, Zhepeng Cen, Zhenyuan Liu, Zhiyuang Huang, Henry Lam, Bo Li, Ding Zhao
In this work, we present Deep Importance Sampling (Deep IS) framework that utilizes a deep neural network to obtain an efficient IS that is on par with the state-of-the-art, capable of reducing the required sample size 43 times smaller than the naive sampling method to achieve 10% relative error and while producing an estimate that is much less conservative.
2 code implementations • 28 Jan 2022 • Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, Ding Zhao
Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Hancheng Cao, Mengjie Cheng, Zhepeng Cen, Daniel A. McFarland, Xiang Ren
We extract scientific concepts (i. e., phrases) from corpora as instantiations of "research ideas", create concept-level features as motivated by literature, and then follow the trajectories of over 450, 000 new concepts (emerged from 1995-2014) to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials.