Browse > Methodology > Policy Gradient Methods

Policy Gradient Methods

28 papers with code · Methodology

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Latest papers with code

Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations

10 Sep 2019venktesh22/ExpressLanes_Deep-RL

This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers' value of time, origin, and destination.

POLICY GRADIENT METHODS

7
10 Sep 2019

Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

11 Jul 2019hsvgbkhgbv/multi-agent-rl

Based on this framework, we propose a local reward approach called Shapley Q-value that can distribute the cumulative global rewards fairly, reflecting each agent's own contribution in contrast to the shared reward approach.

MULTI-AGENT REINFORCEMENT LEARNING POLICY GRADIENT METHODS

3
11 Jul 2019

Ranking Policy Gradient

24 Jun 2019illidanlab/rpg

To accelerate the learning of policy gradient methods, we describe a novel off-policy learning framework and establish the equivalence between maximizing the lower bound of return and imitating a near-optimal policy without accessing any oracles.

POLICY GRADIENT METHODS

4
24 Jun 2019

Explainable Knowledge Graph-based Recommendation via Deep Reinforcement Learning

22 Jun 2019DeepGraphLearning/RecommenderSystems

Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations.

KNOWLEDGE GRAPHS POLICY GRADIENT METHODS RECOMMENDATION SYSTEMS

210
22 Jun 2019

Trajectory-Based Off-Policy Deep Reinforcement Learning

14 May 2019boschresearch/DD_OPG

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks.

CONTINUOUS CONTROL POLICY GRADIENT METHODS STOCHASTIC OPTIMIZATION

4
14 May 2019

Evaluating Rewards for Question Generation Models

NAACL 2019 bloomsburyai/question-generation

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation.

MACHINE TRANSLATION POLICY GRADIENT METHODS QUESTION GENERATION

99
28 Feb 2019

Fast Efficient Hyperparameter Tuning for Policy Gradients

18 Feb 2019supratikp/HOOF

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

META-LEARNING POLICY GRADIENT METHODS

1
18 Feb 2019

On-Policy Trust Region Policy Optimisation with Replay Buffers

ICLR 2019 dkangin/baselines

Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies.

CONTINUOUS CONTROL POLICY GRADIENT METHODS

3
18 Jan 2019

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

5 Oct 2018facebookresearch/WhereDidMyOptimumGo

We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment.

CONTINUOUS CONTROL POLICY GRADIENT METHODS

11
05 Oct 2018