1 code implementation • 9 Feb 2025 • Tenglong Liu, Jianxiong Li, Yinan Zheng, Haoyi Niu, Yixing Lan, Xin Xu, Xianyuan Zhan
In this paper, we propose Parametric Skill Expansion and Composition (PSEC), a new framework designed to iteratively evolve the agents' capabilities and efficiently address new challenges by maintaining a manageable skill library.
no code implementations • 26 Jan 2025 • Yinan Zheng, Ruiming Liang, Kexin Zheng, Jinliang Zheng, Liyuan Mao, Jianxiong Li, Weihao Gu, Rui Ai, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu
Achieving human-like driving behaviors in complex open-world environments is a critical challenge in autonomous driving.
no code implementations • 25 Jan 2025 • Xianyuan Zhan, Xiangyu Zhu, Peng Cheng, Xiao Hu, Ziteng He, Hanfei Geng, Jichao Leng, Huiwen Zheng, Chenhui Liu, Tianshun Hong, Yan Liang, Yunxin Liu, Feng Zhao
In a typical DC, around 30~40% of the energy is spent on the cooling system rather than on computer servers, posing a pressing need for developing new energy-saving optimization technologies for DC cooling systems.
1 code implementation • 17 Jan 2025 • Jinliang Zheng, Jianxiong Li, Dongxiu Liu, Yinan Zheng, Zhihao Wang, Zhonghong Ou, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan
Training on diverse, internet-scale data is a key factor in the success of recent large foundation models.
1 code implementation • 15 Dec 2024 • Guan Wang, Haoyi Niu, Jianxiong Li, Li Jiang, Jianming Hu, Xianyuan Zhan
Among various branches of offline reinforcement learning (RL) methods, goal-conditioned supervised learning (GCSL) has gained increasing popularity as it formulates the offline RL problem as a sequential modeling task, therefore bypassing the notoriously difficult credit assignment challenge of value learning in conventional RL paradigm.
no code implementations • 2 Oct 2024 • Jianxiong Li, Zhihao Wang, Jinliang Zheng, Xiaoai Zhou, Guanming Wang, Guanglu Song, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Junzhi Yu, Xianyuan Zhan
Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions.
1 code implementation • 13 Sep 2024 • Haoyi Niu, Qimao Chen, Tenglong Liu, Jianxiong Li, Guyue Zhou, Yi Zhang, Jianming Hu, Xianyuan Zhan
This process effectively corrects underlying domain gaps, enhancing state realism and dynamics reliability in source data, and allowing flexible integration with various single-domain and cross-domain downstream policy learning methods.
no code implementations • 29 Jul 2024 • Liyuan Mao, Haoran Xu, Xianyuan Zhan, Weinan Zhang, Amy Zhang
In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution.
1 code implementation • 26 Jun 2024 • Yu Luo, Fuchun Sun, Tianying Ji, Xianyuan Zhan
Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals.
Hierarchical Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 30 May 2024 • Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan
To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model.
Ranked #1 on
Visual Question Answering
on V*bench
1 code implementation • 29 May 2024 • Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge.
1 code implementation • 28 May 2024 • Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan
Based on this insight, we present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison, and uses it as an adaptive constraint to guarantee stronger policy learning performance.
1 code implementation • 19 Mar 2024 • Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, Jingliang Duan, Xianyuan Zhan, Jingjing Liu, Yaqin Zhang, Keqiang Li
Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems.
1 code implementation • 28 Feb 2024 • Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan
Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding.
Ranked #1 on
Contrastive Learning
on 10,000 People - Human Pose Recognition Data
(using extra training data)
1 code implementation • 7 Feb 2024 • Haoyi Niu, Jianming Hu, Guyue Zhou, Xianyuan Zhan
Consequently, researchers often resort to data from easily accessible source domains, such as simulation and laboratory environments, for cost-effective data acquisition and rapid model iteration.
1 code implementation • 1 Feb 2024 • Liyuan Mao, Haoran Xu, Weinan Zhang, Xianyuan Zhan
To resolve this issue, we propose a simple yet effective modification that projects the backward gradient onto the normal plane of the forward gradient, resulting in an orthogonal-gradient update, a new learning rule for DICE-based methods.
1 code implementation • 19 Jan 2024 • Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu
Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset.
no code implementations • 28 Dec 2023 • Huiling Qin, Xianyuan Zhan, Yuanxun li, Yu Zheng
Jointly solving these two tasks allows full utilization of information from both labeled and unlabeled data, thus alleviating the problem of over-reliance on labeled data.
no code implementations • 27 Nov 2023 • Jianxiong Li, Shichao Lin, Tianyu Shi, Chujie Tian, Yu Mei, Jian Song, Xianyuan Zhan, Ruimin Li
Specifically, we combine well-established traffic flow theory with machine learning to construct a reward inference model to infer the reward signals from coarse-grained traffic data.
no code implementations • 22 Sep 2023 • Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu, Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming Hu, Xianyuan Zhan
Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging.
1 code implementation • 20 Sep 2023 • Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu
Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels.
Ranked #79 on
Arithmetic Reasoning
on GSM8K
1 code implementation • NeurIPS 2023 • Xiangsen Wang, Haoran Xu, Yinan Zheng, Xianyuan Zhan
Offline reinforcement learning (RL) has received considerable attention in recent years due to its attractive capability of learning policies from offline datasets without environmental interactions.
no code implementations • 15 Jun 2023 • Xiangsen Wang, Xianyuan Zhan
Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years.
1 code implementation • NeurIPS 2023 • Peng Cheng, Xianyuan Zhan, Zhihao Wu, Wenjia Zhang, Shoucheng Song, Han Wang, Youfang Lin, Li Jiang
Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability. Code is available at: https://github. com/pcheng2/TSRL
1 code implementation • 5 Jun 2023 • Tianying Ji, Yu Luo, Fuchun Sun, Xianyuan Zhan, Jianwei Zhang, Huazhe Xu
We find that such a long-neglected phenomenon is often related to the use of inferior actions from the current policy in Bellman updates as compared to the more optimal action samples in the replay buffer.
1 code implementation • 27 May 2023 • Xiao Hu, Jianxiong Li, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang
To unravel this mystery, we identify a long-neglected issue in the query selection schemes of existing PbRL studies: Query-Policy Misalignment.
1 code implementation • 25 May 2023 • Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Ya-Qin Zhang
Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance.
no code implementations • 18 Apr 2023 • Yujie Yang, Zhilong Zheng, Shengbo Eben Li, Jingliang Duan, Jingjing Liu, Xianyuan Zhan, Ya-Qin Zhang
To address this challenge, we propose an indirect safe RL framework called feasible policy iteration, which guarantees that the feasible region monotonically expands and converges to the maximum one, and the state-value function monotonically improves and converges to the optimal one.
1 code implementation • 8 Apr 2023 • Fang Wu, Huiling Qin, Siyuan Li, Stan Z. Li, Xianyuan Zhan, Jinbo Xu
In the field of artificial intelligence for science, it is consistently an essential challenge to face a limited amount of labeled data for real-world problems.
4 code implementations • 28 Mar 2023 • Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan
This gives a deeper understanding of why the in-sample learning paradigm works, i. e., it applies implicit value regularization to the policy.
1 code implementation • 3 Feb 2023 • Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang
RGM is formulated as a bi-level optimization problem: the upper layer optimizes a reward correction term that performs visitation distribution matching w. r. t.
1 code implementation • 15 Oct 2022 • Haoran Xu, Li Jiang, Jianxiong Li, Xianyuan Zhan
We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy.
2 code implementations • 20 Jul 2022 • Haoran Xu, Xianyuan Zhan, Honglei Yin, Huiling Qin
We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions.
1 code implementation • 18 Jul 2022 • Qiying Yu, Jieming Lou, Xianyuan Zhan, Qizhang Li, WangMeng Zuo, Yang Liu, Jingjing Liu
Contrastive learning (CL) has recently been applied to adversarial learning tasks.
no code implementations • 1 Jul 2022 • Wenjia Zhang, Haoran Xu, Haoyi Niu, Peng Cheng, Ming Li, Heming Zhang, Guyue Zhou, Xianyuan Zhan
In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations.
1 code implementation • 27 Jun 2022 • Haoyi Niu, Shubham Sharma, Yiwen Qiu, Ming Li, Guyue Zhou, Jianming Hu, Xianyuan Zhan
This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches?
2 code implementations • 23 May 2022 • Jianxiong Li, Xianyuan Zhan, Haoran Xu, Xiangyu Zhu, Jingjing Liu, Ya-Qin Zhang
In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas.
2 code implementations • 22 Oct 2021 • Guan Wang, Haoyi Niu, Desheng Zhu, Jianming Hu, Xianyuan Zhan, Guyue Zhou
Heated debates continue over the best autonomous driving framework.
no code implementations • 21 Oct 2021 • Jin Li, Xianyuan Zhan, Zixu Xiao, Guyue Zhou
End-to-end learning robotic manipulation with high data efficiency is one of the key challenges in robotics.
no code implementations • 14 Oct 2021 • Haoran Xu, Xianyuan Zhan, Jianxiong Li, Honglei Yin
In this work, we start from the performance difference between the learned policy and the behavior policy, we derive a new policy learning objective that can be used in the offline setting, which corresponds to the advantage function value of the behavior policy, multiplying by a state-marginal density ratio.
no code implementations • 29 Sep 2021 • Huiling Qin, Xianyuan Zhan, Yuanxun li, Haoran Xu, Yu Zheng
Jointly solving these two tasks allows full utilization of information from both labeled and unlabeled data, thus alleviating the problem of over-reliance on labeled data.
no code implementations • 19 Jul 2021 • Haoran Xu, Xianyuan Zhan, Xiangyu Zhu
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.
no code implementations • 30 May 2021 • Huiling Qin, Xianyuan Zhan, Yu Zheng
We propose a correlation structure-based collective anomaly detection (CSCAD) model for high-dimensional anomaly detection problem in large systems, which is also generalizable to semi-supervised or supervised settings.
1 code implementation • 16 May 2021 • Xianyuan Zhan, Xiangyu Zhu, Haoran Xu
The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction.
no code implementations • 23 Feb 2021 • Xianyuan Zhan, Haoran Xu, Yue Zhang, Xiangyu Zhu, Honglei Yin, Yu Zheng
Optimizing the combustion efficiency of a thermal power generating unit (TPGU) is a highly challenging and critical task in the energy industry.