Search Results for author: Xianyuan Zhan

Found 33 papers, 18 papers with code

Policy Bifurcation in Safe Reinforcement Learning

1 code implementation • 19 Mar 2024 • Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, Jingliang Duan, Xianyuan Zhan, Jingjing Liu, Yaqin Zhang, Keqiang Li

Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

no code implementations • 28 Feb 2024 • Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan

Multimodal pretraining has emerged as an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progression information; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding.

Contrastive Learning Decision Making +1

Paper
Add Code

A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents

1 code implementation • 7 Feb 2024 • Haoyi Niu, Jianming Hu, Guyue Zhou, Xianyuan Zhan

Consequently, researchers often resort to data from easily accessible source domains, such as simulation and laboratory environments, for cost-effective data acquisition and rapid model iteration.

Paper
Code

ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update

1 code implementation • 1 Feb 2024 • Liyuan Mao, Haoran Xu, Weinan Zhang, Xianyuan Zhan

To resolve this issue, we propose a simple yet effective modification that projects the backward gradient onto the normal plane of the forward gradient, resulting in an orthogonal-gradient update, a new learning rule for DICE-based methods.

Imitation Learning Offline RL +1

Paper
Code

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

1 code implementation • 19 Jan 2024 • Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset.

Offline RL reinforcement-learning

Paper
Code

FlexSSL : A Generic and Efficient Framework for Semi-Supervised Learning

no code implementations • 28 Dec 2023 • Huiling Qin, Xianyuan Zhan, Yuanxun li, Yu Zheng

Jointly solving these two tasks allows full utilization of information from both labeled and unlabeled data, thus alleviating the problem of over-reliance on labeled data.

Paper
Add Code

A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning

no code implementations • 27 Nov 2023 • Jianxiong Li, Shichao Lin, Tianyu Shi, Chujie Tian, Yu Mei, Jian Song, Xianyuan Zhan, Ruimin Li

Specifically, we combine well-established traffic flow theory with machine learning to construct a reward inference model to infer the reward signals from coarse-grained traffic data.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

no code implementations • 22 Sep 2023 • Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu, Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming Hu, Xianyuan Zhan

Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

1 code implementation • 20 Sep 2023 • Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu

Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels.

Ranked #22 on Code Generation on HumanEval

Arithmetic Reasoning Code Generation +1

4,972

Paper
Code

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization

1 code implementation • NeurIPS 2023 • Xiangsen Wang, Haoran Xu, Yinan Zheng, Xianyuan Zhan

Offline reinforcement learning (RL) has received considerable attention in recent years due to its attractive capability of learning policies from offline datasets without environmental interactions.

Management Multi-agent Reinforcement Learning +3

Paper
Code

Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization

no code implementations • 15 Jun 2023 • Xiangsen Wang, Xianyuan Zhan

Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years.

Management Multi-agent Reinforcement Learning +5

Paper
Add Code

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

1 code implementation • NeurIPS 2023 • Peng Cheng, Xianyuan Zhan, Zhihao Wu, Wenjia Zhang, Shoucheng Song, Han Wang, Youfang Lin, Li Jiang

Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability. Code is available at: https://github. com/pcheng2/TSRL

Data Augmentation Offline RL +1

Paper
Code

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

no code implementations • 5 Jun 2023 • Tianying Ji, Yu Luo, Fuchun Sun, Xianyuan Zhan, Jianwei Zhang, Huazhe Xu

Learning high-quality Q-value functions plays a key role in the success of many modern off-policy deep reinforcement learning (RL) algorithms.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

Query-Policy Misalignment in Preference-Based Reinforcement Learning

no code implementations • 27 May 2023 • Xiao Hu, Jianxiong Li, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

To unravel this mystery, we identify a long-neglected issue in the query selection schemes of existing PbRL studies: Query-Policy Misalignment.

reinforcement-learning

Paper
Add Code

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

1 code implementation • 25 May 2023 • Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Ya-Qin Zhang

Offline-to-online reinforcement learning (RL), by combining the benefits of offline pretraining and online finetuning, promises enhanced sample efficiency and policy performance.

Computational Efficiency reinforcement-learning +1

Paper
Code

Feasible Policy Iteration

no code implementations • 18 Apr 2023 • Yujie Yang, Zhilong Zheng, Shengbo Eben Li, Jingliang Duan, Jingjing Liu, Xianyuan Zhan, Ya-Qin Zhang

To address this challenge, we propose an indirect safe RL framework called feasible policy iteration, which guarantees that the feasible region monotonically expands and converges to the maximum one, and the state-value function monotonically improves and converges to the optimal one.

Reinforcement Learning (RL) Safe Reinforcement Learning

Paper
Add Code

InstructBio: A Large-scale Semi-supervised Learning Paradigm for Biochemical Problems

1 code implementation • 8 Apr 2023 • Fang Wu, Huiling Qin, Siyuan Li, Stan Z. Li, Xianyuan Zhan, Jinbo Xu

In the field of artificial intelligence for science, it is consistently an essential challenge to face a limited amount of labeled data for real-world problems.

molecular representation Representation Learning

Paper
Code

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

3 code implementations • 28 Mar 2023 • Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan

This gives a deeper understanding of why the in-sample learning paradigm works, i. e., it applies implicit value regularization to the policy.

D4RL Offline RL +2

Paper
Code

Mind the Gap: Offline Policy Optimization for Imperfect Rewards

1 code implementation • 3 Feb 2023 • Jianxiong Li, Xiao Hu, Haoran Xu, Jingjing Liu, Xianyuan Zhan, Qing-Shan Jia, Ya-Qin Zhang

RGM is formulated as a bi-level optimization problem: the upper layer optimizes a reward correction term that performs visitation distribution matching w. r. t.

Reinforcement Learning (RL)

Paper
Code

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

1 code implementation • 15 Oct 2022 • Haoran Xu, Li Jiang, Jianxiong Li, Xianyuan Zhan

We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy.

D4RL Offline RL +3

Paper
Code

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

2 code implementations • 20 Jul 2022 • Haoran Xu, Xianyuan Zhan, Honglei Yin, Huiling Qin

We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions.

Imitation Learning Offline RL +1

Paper
Code

Adversarial Contrastive Learning via Asymmetric InfoNCE

1 code implementation • 18 Jul 2022 • Qiying Yu, Jieming Lou, Xianyuan Zhan, Qizhang Li, WangMeng Zuo, Yang Liu, Jingjing Liu

Contrastive learning (CL) has recently been applied to adversarial learning tasks.

Adversarial Robustness Contrastive Learning

Paper
Code

Discriminator-Guided Model-Based Offline Imitation Learning

no code implementations • 1 Jul 2022 • Wenjia Zhang, Haoran Xu, Haoyi Niu, Peng Cheng, Ming Li, Heming Zhang, Guyue Zhou, Xianyuan Zhan

In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations.

Imitation Learning

Paper
Add Code

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

1 code implementation • 27 Jun 2022 • Haoyi Niu, Shubham Sharma, Yiwen Qiu, Ming Li, Guyue Zhou, Jianming Hu, Xianyuan Zhan

This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches?

Offline RL reinforcement-learning +1

Paper
Code

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

2 code implementations • 23 May 2022 • Jianxiong Li, Xianyuan Zhan, Haoran Xu, Xiangyu Zhu, Jingjing Liu, Ya-Qin Zhang

In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas.

D4RL Offline RL +2

Paper
Code

A Versatile and Efficient Reinforcement Learning Framework for Autonomous Driving

2 code implementations • 22 Oct 2021 • Guan Wang, Haoyi Niu, Desheng Zhu, Jianming Hu, Xianyuan Zhan, Guyue Zhou

Heated debates continue over the best autonomous driving framework.

Autonomous Driving reinforcement-learning +2

Paper
Code

Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information

no code implementations • 21 Oct 2021 • Jin Li, Xianyuan Zhan, Zixu Xiao, Guyue Zhou

End-to-end learning robotic manipulation with high data efficiency is one of the key challenges in robotics.

Imitation Learning Reinforcement Learning (RL) +1

Paper
Add Code

Offline Reinforcement Learning with Soft Behavior Regularization

no code implementations • 14 Oct 2021 • Haoran Xu, Xianyuan Zhan, Jianxiong Li, Honglei Yin

In this work, we start from the performance difference between the learned policy and the behavior policy, we derive a new policy learning objective that can be used in the offline setting, which corresponds to the advantage function value of the behavior policy, multiplying by a state-marginal density ratio.

Continuous Control reinforcement-learning +1

Paper
Add Code

Enhancing semi-supervised learning via self-interested coalitional learning

no code implementations • 29 Sep 2021 • Huiling Qin, Xianyuan Zhan, Yuanxun li, Haoran Xu, Yu Zheng

Jointly solving these two tasks allows full utilization of information from both labeled and unlabeled data, thus alleviating the problem of over-reliance on labeled data.

Paper
Add Code

Constraints Penalized Q-learning for Safe Offline Reinforcement Learning

no code implementations • 19 Jul 2021 • Haoran Xu, Xianyuan Zhan, Xiangyu Zhu

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.

Offline RL Q-Learning +2

Paper
Add Code

CSCAD: Correlation Structure-based Collective Anomaly Detection in Complex System

no code implementations • 30 May 2021 • Huiling Qin, Xianyuan Zhan, Yu Zheng

We propose a correlation structure-based collective anomaly detection (CSCAD) model for high-dimensional anomaly detection problem in large systems, which is also generalizable to semi-supervised or supervised settings.

Anomaly Detection

Paper
Add Code

Model-Based Offline Planning with Trajectory Pruning

1 code implementation • 16 May 2021 • Xianyuan Zhan, Xiangyu Zhu, Haoran Xu

The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction.

Offline RL Reinforcement Learning (RL)

Paper
Code

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning

no code implementations • 23 Feb 2021 • Xianyuan Zhan, Haoran Xu, Yue Zhang, Xiangyu Zhu, Honglei Yin, Yu Zheng

Optimizing the combustion efficiency of a thermal power generating unit (TPGU) is a highly challenging and critical task in the energy industry.

Continuous Control Offline RL +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.