no code implementations • 21 Oct 2023 • Yiqin Tan, Ling Pan, Longbo Huang
Deep reinforcement learning has achieved remarkable performance in various domains by leveraging deep neural networks for approximating value functions and policies.
no code implementations • 5 Oct 2023 • Ling Pan, Moksh Jain, Kanika Madan, Yoshua Bengio
However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks.
no code implementations • 5 Oct 2023 • Zarif Ikram, Ling Pan, Dianbo Liu
Due to limited resources and fast economic growth, designing optimal transportation road networks with traffic simulation and validation in a cost-effective manner is vital for developing countries, where extensive manual testing is expensive and often infeasible.
no code implementations • 4 Oct 2023 • Minsu Kim, Joohwan Ko, Dinghuai Zhang, Ling Pan, Taeyoung Yun, Woochang Kim, Jinkyoo Park, Yoshua Bengio
GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular graphs.
no code implementations • 4 Jul 2023 • Zhuoran Li, Ling Pan, Longbo Huang
We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL).
1 code implementation • 26 May 2023 • Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan
In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space.
1 code implementation • 19 Feb 2023 • Ling Pan, Dinghuai Zhang, Moksh Jain, Longbo Huang, Yoshua Bengio
Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures through the lens of "inference as control".
no code implementations • 11 Feb 2023 • Dinghuai Zhang, Ling Pan, Ricky T. Q. Chen, Aaron Courville, Yoshua Bengio
Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps.
1 code implementation • 3 Feb 2023 • Ling Pan, Nikolay Malkin, Dinghuai Zhang, Yoshua Bengio
Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood).
no code implementations • 5 Dec 2022 • Can Chang, Ni Mu, Jiajun Wu, Ling Pan, Huazhe Xu
Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 7 Oct 2022 • Ling Pan, Dinghuai Zhang, Aaron Courville, Longbo Huang, Yoshua Bengio
We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments.
1 code implementation • 30 Aug 2022 • Pihe Hu, Ling Pan, Yu Chen, Zhixuan Fang, Longbo Huang
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing.
1 code implementation • 30 May 2022 • Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang
Training deep reinforcement learning (DRL) models usually requires high computation costs.
no code implementations • 19 Apr 2022 • Zhuoran Li, Xing Wang, Ling Pan, Lin Zhu, Zhendong Wang, Junlan Feng, Chao Deng, Longbo Huang
A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search.
1 code implementation • NeurIPS 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.
no code implementations • 22 Nov 2021 • Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu
Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.
no code implementations • 22 Mar 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.
1 code implementation • NeurIPS 2020 • Ling Pan, Qingpeng Cai, Longbo Huang
A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.
no code implementations • 11 Nov 2019 • Ling Pan, Qingpeng Cai, Longbo Huang
Recent years have witnessed a tremendous improvement of deep reinforcement learning.
no code implementations • 9 Sep 2019 • Qingpeng Cai, Ling Pan, Pingzhong Tang
Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias.
1 code implementation • 14 Mar 2019 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu
In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.
no code implementations • 27 Sep 2018 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Tie-Yan Liu
We then propose the dynamic Boltzmann softmax(DBS) operator to enable the convergence to the optimal value function in value iteration.
no code implementations • 10 Jul 2018 • Qingpeng Cai, Ling Pan, Pingzhong Tang
Such a setting generalizes the widely-studied stochastic state transition setting, namely the setting of deterministic policy gradient (DPG).
no code implementations • 13 Feb 2018 • Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang
Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.