Policy Gradient Methods

87 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find Policy Gradient Methods models and implementations
2 papers
2 papers
See all 7 libraries.

Most implemented papers

Proximal Policy Optimization Algorithms

labmlai/annotated_deep_learning_paper_implementations 20 Jul 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

Self-critical Sequence Training for Image Captioning

ruotianluo/neuraltalk2.pytorch CVPR 2017

In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized.

Trust Region Policy Optimization

DLR-RM/stable-baselines3 19 Feb 2015

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement.

High-Dimensional Continuous Control Using Generalized Advantage Estimation

labmlai/annotated_deep_learning_paper_implementations 8 Jun 2015

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.

Deep Reinforcement Learning for Dialogue Generation

liuyuemaicha/Deep-Reinforcement-Learning-for-Dialogue-Generation-in-tensorflow EMNLP 2016

Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes.

Competitive Policy Optimization

manish-pra/copg 18 Jun 2020

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties.

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

snakeztc/NeuralDialog-LaRL NAACL 2019

Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge.

Distributional Policy Optimization: An Alternative Approach for Continuous Control

tesslerc/GAC NeurIPS 2019

We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

allenai/rl4lms 3 Oct 2022

To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

shaneshixiang/rllabplusplus 7 Nov 2016

We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation.