Search Results for author: Yuhui Wang

Found 16 papers, 7 papers with code

Learning to Identify Critical States for Reinforcement Learning from Videos

1 code implementation15 Aug 2023 Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber

Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions.


Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

1 code implementation30 Jan 2023 Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).

Offline RL reinforcement-learning +1

Research on Intellectual Property Resource Profile and Evolution Law

no code implementations13 Apr 2022 Yuhui Wang, Yingxia Shao, Ang Li

In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing.

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

no code implementations21 Mar 2022 Yuhui Wang, Junping Du, Yingxia Shao

This paper proposes a method for extracting intellectual property entities based on Transformer and technical word information , and provides accurate word vector representation in combination with the BERT language method.

named-entity-recognition Named Entity Recognition +1

Resilient UAV Formation for Coverage and Connectivity of Spatially Dispersed Users

no code implementations11 Mar 2022 Yuhui Wang, Junaid Farooq

Unmanned aerial vehicles (UAVs) are a convenient choice for carrying mobile base stations to rapidly setup communication services for ground users.

Greedy-Step Off-Policy Reinforcement Learning

no code implementations23 Feb 2021 Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan

Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI).

Q-Learning reinforcement-learning +1

The Limit of the Batch Size

no code implementations15 Jun 2020 Yang You, Yuhui Wang, huan zhang, Zhao Zhang, James Demmel, Cho-Jui Hsieh

For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting.

SMIX($λ$): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

1 code implementation11 Nov 2019 Xinghu Yao, Chao Wen, Yuhui Wang, Xiaoyang Tan

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios.

reinforcement-learning Reinforcement Learning (RL) +2

Truly Proximal Policy Optimization

1 code implementation19 Mar 2019 Yuhui Wang, Hao He, Chao Wen, Xiaoyang Tan

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks.

Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

no code implementations15 Feb 2019 Yuhui Wang, Hao He, Xiaoyang Tan

In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this.

Continuous Control Imputation +2

Trust Region-Guided Proximal Policy Optimization

2 code implementations NeurIPS 2019 Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan

We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well.

Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.