Browse > Methodology > Policy Gradient Methods

Policy Gradient Methods

31 papers with code · Methodology

Leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Latest papers with code

A Nonparametric Off-Policy Policy Gradient

8 Jan 2020jacarvalho/nopg

Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes.

DENSITY ESTIMATION POLICY GRADIENT METHODS

1
08 Jan 2020

Hindsight Trust Region Policy Optimization

ICLR 2020 HTRPOCODES/HTRPO-v2

Motivated by the demand for an effective deep reinforcement learning algorithm that accommodates sparse reward environment, this paper presents Hindsight Trust Region Policy Optimization (HTRPO), a method that efficiently utilizes interactions in sparse reward conditions to optimize policies within trust region and, in the meantime, maintains learning stability.

POLICY GRADIENT METHODS

0
01 Jan 2020

Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations

10 Sep 2019venktesh22/ExpressLanes_Deep-RL

This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers' value of time, origin, and destination.

POLICY GRADIENT METHODS

12
10 Sep 2019

Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

11 Jul 2019hsvgbkhgbv/SQDDPG

To deal with this problem, we i) introduce a cooperative-game theoretical framework called extended convex game (ECG) that is a superset of global reward game, and ii) propose a local reward approach called Shapley Q-value.

MULTI-AGENT REINFORCEMENT LEARNING POLICY GRADIENT METHODS

20
11 Jul 2019

Ranking Policy Gradient

24 Jun 2019illidanlab/rpg

Sample inefficiency is a long-lasting problem in reinforcement learning (RL).

POLICY GRADIENT METHODS

16
24 Jun 2019

Explainable Knowledge Graph-based Recommendation via Deep Reinforcement Learning

22 Jun 2019DeepGraphLearning/RecommenderSystems

Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations.

KNOWLEDGE GRAPHS POLICY GRADIENT METHODS RECOMMENDATION SYSTEMS

373
22 Jun 2019

Distributional Policy Optimization: An Alternative Approach for Continuous Control

NeurIPS 2019 neurips-2019/GAC

We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.

CONTINUOUS CONTROL POLICY GRADIENT METHODS

8
23 May 2019

Trajectory-Based Off-Policy Deep Reinforcement Learning

14 May 2019boschresearch/DD_OPG

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks.

CONTINUOUS CONTROL POLICY GRADIENT METHODS STOCHASTIC OPTIMIZATION

5
14 May 2019

Evaluating Rewards for Question Generation Models

NAACL 2019 bloomsburyai/question-generation

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation.

MACHINE TRANSLATION POLICY GRADIENT METHODS QUESTION GENERATION

133
28 Feb 2019