Search Results for author: Yuhui Wang

Found 24 papers, 9 papers with code

Cluster-Based Multi-Agent Task Scheduling for Space-Air-Ground Integrated Networks

no code implementations14 Dec 2024 Zhiying Wang, Gang Sun, Yuhui Wang, Hongfang Yu, Dusit Niyato

The Space-Air-Ground Integrated Network (SAGIN) framework is a crucial foundation for future networks, where satellites and aerial nodes assist in computational task offloading.

Clustering Multi-agent Reinforcement Learning +1

RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction

no code implementations25 Oct 2024 Tanqiu Jiang, Zian Wang, Jiacheng Liang, Changjiang Li, Yuhui Wang, Ting Wang

Jailbreak attacks circumvent LLMs' built-in safeguards by concealing harmful queries within jailbreak prompts.

Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

no code implementations12 Jun 2024 Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber

The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL).

Reinforcement Learning (RL)

Highway Value Iteration Networks

no code implementations5 Jun 2024 Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber

To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs.

Diversity Safe Exploration

Highway Reinforcement Learning

no code implementations28 May 2024 Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber

We show, however, that such IS-free methods underestimate the optimal value function (VF), especially for large $n$, restricting their capacity to efficiently utilize information from distant future time steps.

Q-Learning reinforcement-learning +2

Variational Delayed Policy Optimization

1 code implementation23 May 2024 Qingyuan Wu, Simon Sinong Zhan, YiXuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang

In environments with delayed observation, state augmentation by including actions within the delay window is adopted to retrieve Markovian property to enable reinforcement learning (RL).

Reinforcement Learning (RL) Variational Inference

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

1 code implementation5 Feb 2024 Qingyuan Wu, Simon Sinong Zhan, YiXuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments.

reinforcement-learning Reinforcement Learning +1

Deep Reinforcement Learning Based Placement for Integrated Access Backhauling in UAV-Assisted Wireless Networks

no code implementations21 Dec 2023 Yuhui Wang, Junaid Farooq

The advent of fifth generation (5G) networks has opened new avenues for enhancing connectivity, particularly in challenging environments like remote areas or disaster-struck regions.

Deep Reinforcement Learning

Learning to Identify Critical States for Reinforcement Learning from Videos

1 code implementation ICCV 2023 Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions.

Deep Reinforcement Learning reinforcement-learning

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

1 code implementation30 Jan 2023 Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).

Offline RL reinforcement-learning +2

Research on Intellectual Property Resource Profile and Evolution Law

no code implementations13 Apr 2022 Yuhui Wang, Yingxia Shao, Ang Li

In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing.

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

no code implementations21 Mar 2022 Yuhui Wang, Junping Du, Yingxia Shao

This paper proposes a method for extracting intellectual property entities based on Transformer and technical word information , and provides accurate word vector representation in combination with the BERT language method.

named-entity-recognition Named Entity Recognition +1

Resilient UAV Formation for Coverage and Connectivity of Spatially Dispersed Users

no code implementations11 Mar 2022 Yuhui Wang, Junaid Farooq

Unmanned aerial vehicles (UAVs) are a convenient choice for carrying mobile base stations to rapidly setup communication services for ground users.

Greedy-Step Off-Policy Reinforcement Learning

no code implementations23 Feb 2021 Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan

Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI).

Q-Learning reinforcement-learning +2

The Limit of the Batch Size

no code implementations15 Jun 2020 Yang You, Yuhui Wang, huan zhang, Zhao Zhang, James Demmel, Cho-Jui Hsieh

For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting.

SMIX($λ$): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

1 code implementation11 Nov 2019 Xinghu Yao, Chao Wen, Yuhui Wang, Xiaoyang Tan

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios.

reinforcement-learning Reinforcement Learning +3

Truly Proximal Policy Optimization

1 code implementation19 Mar 2019 Yuhui Wang, Hao He, Chao Wen, Xiaoyang Tan

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks.

Deep Reinforcement Learning

Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

no code implementations15 Feb 2019 Yuhui Wang, Hao He, Xiaoyang Tan

In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this.

continuous-control Continuous Control +4

Trust Region-Guided Proximal Policy Optimization

2 code implementations NeurIPS 2019 Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan

We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well.

Deep Reinforcement Learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.