Policy Gradient Methods

57 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Greatest papers with code

Proximal Policy Optimization Algorithms

labmlai/annotated_deep_learning_paper_implementations 20 Jul 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

Dota 2 Policy Gradient Methods

High-Dimensional Continuous Control Using Generalized Advantage Estimation

labmlai/annotated_deep_learning_paper_implementations 8 Jun 2015

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.

Continuous Control Policy Gradient Methods

Trust Region Policy Optimization

hill-a/stable-baselines 19 Feb 2015

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement.

Atari Games Policy Gradient Methods

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

uber-common/deep-neuroevolution NeurIPS 2018

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e. g. hours vs. days) because they parallelize better.

Policy Gradient Methods Q-Learning

Self-critical Sequence Training for Image Captioning

ruotianluo/ImageCaptioning.pytorch CVPR 2017

In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized.

Image Captioning Policy Gradient Methods

Explainable Knowledge Graph-based Recommendation via Deep Reinforcement Learning

DeepGraphLearning/RecommenderSystems 22 Jun 2019

Recently, a variety of methods have been developed for this problem, which generally try to learn effective representations of users and items and then match items to users according to their representations.

Knowledge Graphs Policy Gradient Methods +1

Evaluating Rewards for Question Generation Models

bloomsburyai/question-generation NAACL 2019

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation.

Machine Translation Policy Gradient Methods +2

Deep Reinforcement Learning for Dialogue Generation

liuyuemaicha/Deep-Reinforcement-Learning-for-Dialogue-Generation-in-tensorflow EMNLP 2016

Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes.

Dialogue Generation Policy Gradient Methods

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

shaneshixiang/rllabplusplus 7 Nov 2016

We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation.

Continuous Control Policy Gradient Methods +1

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

snakeztc/NeuralDialog-LaRL NAACL 2019

Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge.

Decision Making Dialogue Generation +5