no code implementations • 14 Mar 2023 • Han Zheng, Xufang Luo, Pengfei Wei, Xuan Song, Dongsheng Li, Jing Jiang
In this paper, we consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online, and propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
no code implementations • 17 Jun 2022 • Kerong Wang, Hanye Zhao, Xufang Luo, Kan Ren, Weinan Zhang, Dongsheng Li
Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment.
no code implementations • 19 May 2022 • Zhengyu Yang, Kan Ren, Xufang Luo, Minghuan Liu, Weiqing Liu, Jiang Bian, Weinan Zhang, Dongsheng Li
Considering the great performance of ensemble methods on both accuracy and generalization in supervised learning (SL), we design a robust and applicable method named Ensemble Proximal Policy Optimization (EPPO), which learns ensemble policies in an end-to-end manner.
1 code implementation • 17 Feb 2022 • Che Wang, Xufang Luo, Keith Ross, Dongsheng Li
We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks.
no code implementations • 29 Sep 2021 • Zhengyu Yang, Kan Ren, Xufang Luo, Weiqing Liu, Jiang Bian, Weinan Zhang, Dongsheng Li
Ensemble learning, which can consistently improve the prediction performance in supervised learning, has drawn increasing attentions in reinforcement learning (RL).
no code implementations • ICLR 2022 • Dongqi Han, Tadashi Kozuno, Xufang Luo, Zhao-Yun Chen, Kenji Doya, Yuqing Yang, Dongsheng Li
How to make intelligent decisions is a central problem in machine learning and cognitive science.
no code implementations • 29 Sep 2021 • Han Zheng, Xufang Luo, Pengfei Wei, Xuan Song, Dongsheng Li, Jing Jiang
Specifically, we explicitly consider the difference between the online and offline data and apply an adaptive update scheme accordingly, i. e., a pessimistic update strategy for the offline dataset and a greedy or no pessimistic update scheme for the online dataset.
no code implementations • 25 Sep 2019 • Xufang Luo, Qi Meng, Wei Chen, Tie-Yan Liu
Hence, some new algorithms that conduct optimizations directly in the path space (the path space is proven to be PSI) were developed, such as Stochastic Gradient Descent (SGD) in the path space, and it was shown that SGD in the path space is superior to that in the weight space.
no code implementations • 27 Sep 2018 • Xufang Luo, Qi Meng, Di He, Wei Chen, Yunhong Wang, Tie-Yan Liu
Based on our observations, we formally define expressiveness of the state extractor as the rank of the matrix composed by representations.