Browse > Methodology > Policy Gradient Methods

Policy Gradient Methods

31 papers with code ยท Methodology

Leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Latest papers without code

GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction

17 Feb 2020

In this work we present a new method of black-box optimization and constraint satisfaction.

POLICY GRADIENT METHODS

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

12 Feb 2020

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems where the goal is to find a policy (using data from several tasks represented by Markov Decision Processes (MDPs)) that can be updated by one step of stochastic policy gradient for the realized MDP.

META-LEARNING POLICY GRADIENT METHODS

Statistically Efficient Off-Policy Policy Gradients

10 Feb 2020

Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value.

POLICY GRADIENT METHODS

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

7 Feb 2020

We first obtain an ensemble of experts, one for each latent MDP, and fuse their advice to compute a baseline policy.

DECISION MAKING POLICY GRADIENT METHODS

Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks

31 Jan 2020

We present Neural MMO, a massively multiagent game environment inspired by MMOs and discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.

POLICY GRADIENT METHODS

Deep Reinforcement Learning based Blind mmWave MIMO Beam Alignment

25 Jan 2020

Directional beamforming is a crucial component for realizing robust wireless communication systems using millimeter wave (mmWave) technology.

POLICY GRADIENT METHODS

A Stochastic Derivative Free Optimization Method with Momentum

ICLR 2020

In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) Bergou et al. (2019).

CONTINUOUS CONTROL POLICY GRADIENT METHODS

AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING

ICLR 2020

The influence of the ensemble of dynamics models on the policy update is controlled by adjusting the number of virtually performed rollouts in the next iteration according to the ratio of the real and virtual total reward.

POLICY GRADIENT METHODS

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

ICLR 2020

This sample complexity improves the existing result $O(1/\epsilon^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O(1/\epsilon^{1/6})$.

POLICY GRADIENT METHODS

Policy Tree Network

ICLR 2020

However, decision-time planning with implicit dynamics models in continuous action space has proven to be a difficult problem.

POLICY GRADIENT METHODS Q-LEARNING