Search Results for author: Qingyuan Wu

Found 12 papers, 3 papers with code

Inverse Delayed Reinforcement Learning

no code implementations4 Dec 2024 Simon Sinong Zhan, Qingyuan Wu, Zhian Ruan, Frank Yang, Philip Wang, YiXuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

Instead of relying on direct observations, our approach employs an efficient off-policy adversarial training framework to derive expert features and recover optimal policies from augmented delayed observations.

reinforcement-learning Reinforcement Learning

Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments

no code implementations4 Oct 2024 Simon Sinong Zhan, Qingyuan Wu, Philip Wang, YiXuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded.

Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

no code implementations12 Jun 2024 Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber

The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL).

Reinforcement Learning (RL)

Highway Value Iteration Networks

no code implementations5 Jun 2024 Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber

To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs.

Diversity Safe Exploration

Highway Reinforcement Learning

no code implementations28 May 2024 Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber

We show, however, that such IS-free methods underestimate the optimal value function (VF), especially for large $n$, restricting their capacity to efficiently utilize information from distant future time steps.

Q-Learning reinforcement-learning +2

Variational Delayed Policy Optimization

1 code implementation23 May 2024 Qingyuan Wu, Simon Sinong Zhan, YiXuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang

In environments with delayed observation, state augmentation by including actions within the delay window is adopted to retrieve Markovian property to enable reinforcement learning (RL).

Reinforcement Learning (RL) Variational Inference

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

1 code implementation5 Feb 2024 Qingyuan Wu, Simon Sinong Zhan, YiXuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments.

reinforcement-learning Reinforcement Learning +1

State-Wise Safe Reinforcement Learning With Pixel Observations

1 code implementation3 Nov 2023 Simon Sinong Zhan, YiXuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu

In the context of safe exploration, Reinforcement Learning (RL) has long grappled with the challenges of balancing the tradeoff between maximizing rewards and minimizing safety violations, particularly in complex environments with contact-rich or non-smooth dynamics, and when dealing with high-dimensional pixel observations.

reinforcement-learning Reinforcement Learning +3

Learning Downstream Task by Selectively Capturing Complementary Knowledge from Multiple Self-supervisedly Learning Pretexts

no code implementations11 Apr 2022 Jiayu Yao, Qingyuan Wu, Quan Feng, Songcan Chen

Self-supervised learning (SSL), as a newly emerging unsupervised representation learning paradigm, generally follows a two-stage learning pipeline: 1) learning invariant and discriminative representations with auto-annotation pretext(s), then 2) transferring the representations to assist downstream task(s).

Representation Learning Self-Supervised Learning

Topic Driven Adaptive Network for Cross-Domain Sentiment Classification

no code implementations28 Nov 2021 Yicheng Zhu, Yiqiao Qiu, Qingyuan Wu, Fu Lee Wang, Yanghui Rao

In this vein, most approaches utilized domain adaptation that maps data from different domains into a common feature space.

Classification Domain Adaptation +3

Kuramoto model based analysis reveals oxytocin effects on brain network dynamics

no code implementations18 May 2021 Shuhan Zheng, Zhichao Liang, Youzhi Qu, Qingyuan Wu, Haiyan Wu, Quanying Liu

Here, we propose a physics-based framework of Kuramoto model to investigate oxytocin effects on the phase dynamic neural coupling in DMN and FPN.

Greedy-Step Off-Policy Reinforcement Learning

no code implementations23 Feb 2021 Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan

Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI).

Q-Learning reinforcement-learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.