Browse > Methodology > Policy Gradient Methods

Policy Gradient Methods

23 papers with code · Methodology

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Proximal Policy Optimization Algorithms

20 Jul 2017NervanaSystems/coach

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

DOTA 2 POLICY GRADIENT METHODS

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

NeurIPS 2018 uber-research/deep-neuroevolution

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e. g. hours vs. days) because they parallelize better.

POLICY GRADIENT METHODS Q-LEARNING

Trust Region Policy Optimization

19 Feb 2015hill-a/stable-baselines

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement.

ATARI GAMES POLICY GRADIENT METHODS

Self-critical Sequence Training for Image Captioning

CVPR 2017 ruotianluo/self-critical.pytorch

In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized.

IMAGE CAPTIONING POLICY GRADIENT METHODS

High-Dimensional Continuous Control Using Generalized Advantage Estimation

8 Jun 2015pat-coady/trpo

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.

CONTINUOUS CONTROL POLICY GRADIENT METHODS

Deep Reinforcement Learning for Dialogue Generation

EMNLP 2016 liuyuemaicha/Deep-Reinforcement-Learning-for-Dialogue-Generation-in-tensorflow

Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes.

CHATBOT DIALOGUE GENERATION POLICY GRADIENT METHODS

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

7 Nov 2016shaneshixiang/rllabplusplus

We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation.

CONTINUOUS CONTROL POLICY GRADIENT METHODS Q-LEARNING

Evaluating Rewards for Question Generation Models

NAACL 2019 bloomsburyai/question-generation

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation.

MACHINE TRANSLATION POLICY GRADIENT METHODS