no code implementations • 31 Mar 2024 • Changnan Xiao, Bing Liu
Length generalization (LG) is a challenging problem in learning to reason.
no code implementations • 22 Nov 2023 • Changnan Xiao, Bing Liu
However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations.
1 code implementation • 22 Jun 2023 • Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Bing Liu
This paper shows that CIL is learnable.
no code implementations • 20 Apr 2023 • Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Zixuan Ke, Bing Liu
The key theoretical result is that regardless of whether WP and OOD detection (or TP) are defined explicitly or implicitly by a CIL algorithm, good WP and good OOD detection are necessary and sufficient conditions for good CIL, which unifies novelty or OOD detection and continual learning (CIL, in particular).
no code implementations • 9 Mar 2023 • Changnan Xiao, Yongxin Zhang, Xuefeng Huang, Qinhan Huang, Jie Chen, Peng Sun
Strategy card game is a well-known genre that is demanding on the intelligent game-play and can be an ideal test-bench for AI.
no code implementations • 7 Mar 2023 • Wei Xi, Yongxin Zhang, Changnan Xiao, Xuefeng Huang, Shihong Deng, Haowei Liang, Jie Chen, Peng Sun
Deep Reinforcement Learning combined with Fictitious Play shows impressive results on many benchmark games, most of which are, however, single-stage.
1 code implementation • 4 Nov 2022 • Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Zixuan Ke, Bing Liu
Continual learning (CL) learns a sequence of tasks incrementally.
no code implementations • 7 Jun 2022 • Jiajun Fan, Changnan Xiao
Then, we cast these two problems into the training data distribution optimization problem, namely to obtain desired training data within limited interactions, and address them concurrently via i) explicit modeling and control of the capacity and diversity of behavior policy and ii) more fine-grained and adaptive control of selective/sampling distribution of the behavior policy using a monotonic data distribution optimization.
Ranked #1 on Atari Games on atari game
1 code implementation • 17 Mar 2022 • Gyuhak Kim, Sepideh Esmaeilpour, Changnan Xiao, Bing Liu
Existing continual learning techniques focus on either task incremental learning (TIL) or class incremental learning (CIL) problem, but not both.
no code implementations • 11 Jun 2021 • Jiajun Fan, Changnan Xiao, Yue Huang
Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process.
Ranked #1 on Atari Games on Atari 2600 Freeway
no code implementations • 1 Jun 2021 • Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng
We find valued-based reinforcement learning methods with {\epsilon}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem.
no code implementations • 9 May 2021 • Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, Haiyan Yin
We study the problem of model-free reinforcement learning, which is often solved following the principle of Generalized Policy Iteration (GPI).
1 code implementation • 1 Jan 2021 • Dongyang Zhao, Yue Huang, Changnan Xiao, Yue Li, Shihong Deng
To address the problem brought by the environment, we propose a Meta Soft Hierarchical reinforcement learning framework (MeSH), in which each low-level sub-policy focuses on a specific sub-task respectively and high-level policy automatically learns to utilize low-level sub-policies through meta-gradients.
Hierarchical Reinforcement Learning Meta Reinforcement Learning +2