no code implementations • 14 Dec 2024 • Jia Hu, Xuerun Yan, Tian Xu, Haoran Wang
Hence, the proposed HCPI-RL planner has the following features: i) Evolutionary automated driving with monotonic performance enhancement; ii) With the capability of handling scenarios with emergency; iii) With enhanced decision-making optimality.
no code implementations • 29 Aug 2024 • Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo
For the SFT of Llama-3-8B models, GEM outperforms CE in several aspects.
3 code implementations • 27 Dec 2023 • Zijie Yang, Yongjing Yin, Chaojun Kong, Tiange Chi, Wufan Tao, Yue Zhang, Tian Xu
Natural Medicinal Materials (NMMs) have a long history of global clinical applications and a wealth of records and knowledge.
1 code implementation • 17 Dec 2023 • Ziniu Li, Tian Xu, Yang Yu
These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model.
2 code implementations • 16 Oct 2023 • Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO.
no code implementations • 9 Oct 2023 • Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value.
1 code implementation • 11 Jun 2023 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed.
1 code implementation • 27 Jan 2023 • Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo
This paper considers a situation where, besides the small amount of expert data, a supplementary dataset is available, which can be collected cheaply from sub-optimal policies.
no code implementations • 3 Aug 2022 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
Imitation learning learns a policy from expert trajectories.
no code implementations • 19 Jun 2022 • Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu
In this survey, we take a review of MBRL with a focus on the recent progress in deep RL.
no code implementations • 1 Jun 2022 • Chengxing Jia, Hao Yin, Chenxiao Gao, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu
Model-based offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization, where the learned policy could adapt to different dynamics enumerated at the training stage.
no code implementations • 22 Mar 2022 • Ziniu Li, Tian Xu, Yang Yu
In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is $\widetilde{\mathcal O}(|\mathcal S|^2|\mathcal A|^2 (1-\gamma)^{-5}\varepsilon^{-2})$.
no code implementations • 5 Feb 2022 • Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo
First, we show that ValueDice could reduce to BC under the offline setting.
no code implementations • 19 Jun 2021 • Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo
For some MDPs, we show that vanilla AIL has a worse sample complexity than BC.
no code implementations • 18 May 2021 • Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu, Yang Yu
To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time.
no code implementations • 31 Mar 2021 • Xu Chen, Bangguo Yin, Songqiang Chen, Haifeng Li, Tian Xu
The series strategy avoids RS-m inconsistency as inputs are high-resolution large-scale RSIs, and reduces the distribution gap in multi-scale map generation through similar pixel distributions among multi-scale maps.
no code implementations • 30 Nov 2020 • Lingxiao Wang, Tian Xu, Till Hannes Stoecker, Horst Stoecker, Yin Jiang, Kai Zhou
As the COVID-19 pandemic continues to ravage the world, it is of critical significance to provide a timely risk prediction of the COVID-19 in multi-level.
no code implementations • NeurIPS 2020 • Tian Xu, Ziniu Li, Yang Yu
In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation.
no code implementations • 20 Jul 2020 • Tian Xu, Jennifer White, Sinan Kalkan, Hatice Gunes
Recognition of expressions of emotions and affect from facial images is a well-studied research problem in the fields of affective computing and computer vision with a large number of datasets available containing facial images and corresponding expression labels.
no code implementations • 16 Nov 2019 • Tian Xu, Ziniu Li, Yang Yu
We also show that the framework leads to the value discrepancy of GAIL in an order of O((1-\gamma)^{-1}).
no code implementations • 19 Nov 2018 • Tian Xu, Jiayu Zhan, Oliver G. B. Garrod, Philip H. S. Torr, Song-Chun Zhu, Robin A. A. Ince, Philippe G. Schyns
However, understanding the information represented and processed in CNNs remains in most cases challenging.
no code implementations • 31 Aug 2013 • Ryan Wen Liu, Tian Xu
In this work, a new constrained hybrid variational deblurring model is developed by combining the non-convex first- and second-order total variation regularizers.