Policy Gradient Methods
96 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsMost implemented papers
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.
Self-critical Sequence Training for Image Captioning
In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized.
Trust Region Policy Optimization
We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement.
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.
Deep Reinforcement Learning for Dialogue Generation
Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes.
Competitive Policy Optimization
A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties.
Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models
Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge.
Distributional Policy Optimization: An Alternative Approach for Continuous Control
We show that optimizing over such sets results in local movement in the action space and thus convergence to sub-optimal solutions.
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation.