1 code implementation • 23 Oct 2024 • Yao Tang, Zhihui Xie, Zichuan Lin, Deheng Ye, Shuai Li
Motivated by how humans learn by organizing knowledge in a curriculum, CurrMask adjusts its masking scheme during pretraining for learning versatile skills.
1 code implementation • 20 Nov 2023 • Tiantian Zhang, Kevin Zehua Shen, Zichuan Lin, Bo Yuan, Xueqian Wang, Xiu Li, Deheng Ye
On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting.
1 code implementation • 26 May 2023 • Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Wei Yang, Shuai Li
While promising, return conditioning is limited to training data labeled with rewards and therefore faces challenges in learning from unsupervised data.
1 code implementation • 5 Feb 2023 • Zichuan Lin, Xiapeng Wu, Mingfei Sun, Deheng Ye, Qiang Fu, Wei Yang, Wei Liu
Recent success in Deep Reinforcement Learning (DRL) methods has shown that policy optimization with respect to an off-policy distribution via importance sampling is effective for sample reuse.
no code implementations • 8 Jan 2023 • Wenzhe Li, Hao Luo, Zichuan Lin, Chongjie Zhang, Zongqing Lu, Deheng Ye
Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings.
no code implementations • 8 Nov 2022 • Zhihui Xie, Zichuan Lin, Junyou Li, Shuai Li, Deheng Ye
The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning.
1 code implementation • 21 Sep 2022 • Haibin Zhou, Tong Wei, Zichuan Lin, Junyou Li, Junliang Xing, Yuanchun Shi, Li Shen, Chao Yu, Deheng Ye
We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space.
no code implementations • 1 Sep 2022 • Tiantian Zhang, Zichuan Lin, Yuxing Wang, Deheng Ye, Qiang Fu, Wei Yang, Xueqian Wang, Bin Liang, Bo Yuan, Xiu Li
A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime, while minimizing the catastrophic forgetting of the learned information.
no code implementations • 17 Feb 2022 • Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman
With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers.
no code implementations • 7 Dec 2021 • Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang
To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.
Efficient Exploration Hierarchical Reinforcement Learning +6
no code implementations • 9 Jun 2021 • Zichuan Lin, Jing Huang, BoWen Zhou, Xiaodong He, Tengyu Ma
Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on dialog systems and found that improvement on individual components (e. g., NLU, policy) in prior work may not necessarily bring benefit to pipeline systems in system-wise evaluation.
no code implementations • NeurIPS 2020 • Zichuan Lin, Derek Yang, Li Zhao, Tao Qin, Guangwen Yang, Tie-Yan Liu
In this work, we propose a set of novel reward decomposition principles by constraining uniqueness and compactness of different state features/representations relevant to different sub-rewards.
1 code implementation • NeurIPS 2020 • Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma
When the test task distribution is different from the training task distribution, the performance may degrade significantly.
no code implementations • NeurIPS 2019 • Zichuan Lin, Li Zhao, Derek Yang, Tao Qin, Guangwen Yang, Tie-Yan Liu
Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel.
6 code implementations • NeurIPS 2019 • Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, Tie-Yan Liu
The key challenge in practical distributional RL algorithms lies in how to parameterize estimated distributions so as to better approximate the true continuous distribution.
Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)
1 code implementation • 16 Apr 2019 • Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Zichuan Lin, Chongjie Zhang
We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability.
no code implementations • 19 May 2018 • Zichuan Lin, Tianqi Zhao, Guangwen Yang, Lintao Zhang
Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN).