1 code implementation • 25 Dec 2024 • Yassine Chemingui, Aryan Deshwal, Honghao Wei, Alan Fern, Janardhan Rao Doppa
Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints.
1 code implementation • 16 Dec 2024 • Zijian Gu, Jianwei Ma, Yan Huang, Honghao Wei, Zhanye Chen, HUI ZHANG, Wei Hong
Hence, in this paper, we present the radar-camera fusion network with Hybrid Generation and Synchronization (HGSFusion), designed to better fuse radar potentials and image features for 3D object detection.
Ranked #1 on
3D Object Detection (RoI)
on View-of-Delft (val)
3D Object Detection
3D Object Detection on View-of-Delft (val)
+4
1 code implementation • 5 Dec 2024 • Keru Chen, Honghao Wei, Zhigang Deng, Sen Lin
The high costs and risks involved in extensive environment interactions hinder the practical application of current online safe reinforcement learning (RL) methods.
1 code implementation • 25 Oct 2024 • Xiyue Peng, Hengquan Guo, Jiawei Zhang, Dongqing Zou, Ziyu Shao, Honghao Wei, Xin Liu
To address this issue, we propose Rectified Policy Optimization (RePO), which replaces the expected safety constraint with critical safety constraints imposed on every prompt.
no code implementations • 11 Jun 2024 • Qining Zhang, Honghao Wei, Lei Ying
In this paper, we study reinforcement learning from human feedback (RLHF) under an episodic Markov decision process with a general trajectory-wise reward model.
no code implementations • 1 Jan 2024 • Honghao Wei, Xiyue Peng, Arnob Ghosh, Xin Liu
In theory, we demonstrate that when the actor employs a no-regret optimization oracle, WSAC achieves a number of guarantees: (i) For the first time in the safe offline RL setting, we establish that WSAC can produce a policy that outperforms any reference policy while maintaining the same level of safety, which is critical to designing a safe algorithm for offline RL.
no code implementations • 22 Dec 2023 • Honghao Wei, Xin Liu, Lei Ying
This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step.
no code implementations • 27 Sep 2023 • Zihan Zhou, Honghao Wei, Lei Ying
PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs an approximately optimal policy with a high probability at the end of learning; and (iii) PRI guarantees $\tilde{\mathcal{O}}(H\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(H^4 \sqrt{SA}K^{\frac{4}{5}})$ under a model-free algorithm, where $H$ is the length of each episode, $S$ is the number of states, $A$ is the number of actions, and the total number of episodes during learning is $2K+\tilde{\cal O}(K^{0. 25}).$ We further present a matching lower via an example that shows under any online learning algorithm, there exists a well-separated CMDP instance such that either the regret or violation has to be $\Omega(H\sqrt{K}),$ which matches the upper bound by a polylogarithmic factor.
no code implementations • 10 Mar 2023 • Honghao Wei, Arnob Ghosh, Ness Shroff, Lei Ying, Xingyu Zhou
We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost).
no code implementations • 13 Dec 2022 • Xin Liu, Honghao Wei, Lei Ying
The proposed algorithm is distributed in two aspects: (i) the learned policy is a distributed policy that maps a local state of an agent to its local action and (ii) the learning/training is distributed, during which each agent updates its policy based on its own and neighbors' information.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 3 Jun 2021 • Honghao Wei, Xin Liu, Lei Ying
This paper presents the first model-free, simulator-free reinforcement learning algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation.
2 code implementations • 4 Oct 2020 • Honghao Wei, Lei Ying
In this paper, we propose a new type of Actor, named forward-looking Actor or FORK for short, for Actor-Critic algorithms.
no code implementations • 4 Mar 2019 • Honghao Wei, Xiaohan Kang, Weina Wang, Lei Ying
The algorithm consists of an offline machine learning algorithm for learning the probabilistic information spreading model and an online optimal stopping algorithm to detect misinformation.