Search Results for author: Ling Pan

Found 11 papers, 3 papers with code

Network Topology Optimization via Deep Reinforcement Learning

no code implementations19 Apr 2022 Zhuoran Li, Xing Wang, Ling Pan, Lin Zhu, Zhendong Wang, Junlan Feng, Chao Deng, Longbo Huang

A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search.

reinforcement-learning

Regularized Softmax Deep Multi-Agent Q-Learning

1 code implementation NeurIPS 2021 Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +3

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

no code implementations22 Nov 2021 Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.

Continuous Control Multi-agent Reinforcement Learning +2

Regularized Softmax Deep Multi-Agent $Q$-Learning

no code implementations22 Mar 2021 Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +3

Softmax Deep Double Deterministic Policy Gradients

1 code implementation NeurIPS 2020 Ling Pan, Qingpeng Cai, Longbo Huang

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.

Continuous Control

Multi-Path Policy Optimization

no code implementations11 Nov 2019 Ling Pan, Qingpeng Cai, Longbo Huang

Recent years have witnessed a tremendous improvement of deep reinforcement learning.

Efficient Exploration

Deterministic Value-Policy Gradients

no code implementations9 Sep 2019 Qingpeng Cai, Ling Pan, Pingzhong Tang

Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias.

Continuous Control reinforcement-learning

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation14 Mar 2019 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +1

A Convergent Variant of the Boltzmann Softmax Operator in Reinforcement Learning

no code implementations27 Sep 2018 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Tie-Yan Liu

We then propose the dynamic Boltzmann softmax(DBS) operator to enable the convergence to the optimal value function in value iteration.

Atari Games Q-Learning +1

Deterministic Policy Gradients With General State Transitions

no code implementations10 Jul 2018 Qingpeng Cai, Ling Pan, Pingzhong Tang

Such a setting generalizes the widely-studied stochastic state transition setting, namely the setting of deterministic policy gradient (DPG).

Continuous Control

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems

no code implementations13 Feb 2018 Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang

Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.