Policy Gradient Methods

88 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Policy Gradient Methods models and implementations
2 papers
1,150
2 papers
610
See all 7 libraries.

Most implemented papers

Action-depedent Control Variates for Policy Optimization via Stein's Identity

DartML/PPO-Stein-Control-Variate 30 Oct 2017

Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems.

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

uber-research/deep-neuroevolution NeurIPS 2018

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e. g. hours vs. days) because they parallelize better.

Remember and Forget for Experience Replay

cselab/smarties ICLR 2019

ER recalls experiences from past iterations to compute gradient estimates for the current policy, increasing data-efficiency.

On-Policy Trust Region Policy Optimisation with Replay Buffers

dkangin/baselines ICLR 2019

Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies.

Trajectory-Based Off-Policy Deep Reinforcement Learning

boschresearch/DD_OPG 14 May 2019

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks.

Ekar: An Explainable Method for Knowledge Aware Recommendation

DeepGraphLearning/RecommenderSystems 22 Jun 2019

Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations.

Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

hsvgbkhgbv/SQDDPG 11 Jul 2019

To deal with this problem, we i) introduce a cooperative-game theoretical framework called extended convex game (ECG) that is a superset of global reward game, and ii) propose a local reward approach called Shapley Q-value.

Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations

venktesh22/ExpressLanes_Deep-RL 10 Sep 2019

This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers' value of time, origin, and destination.

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

GRASP-ML/LPG-FTW NeurIPS 2020

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems.

Analysis of the Optimization Landscape of Linear Quadratic Gaussian (LQG) Control

zhengy09/LQG_gradient 8 Feb 2021

This paper revisits the classical Linear Quadratic Gaussian (LQG) control from a modern optimization perspective.