Browse > Methodology > Policy Gradient Methods

Policy Gradient Methods

28 papers with code ยท Methodology

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Latest papers without code

Transfer Reward Learning for Policy Gradient-Based Text Generation

9 Sep 2019

However, we argue that current n-gram overlap based measures that are used as rewards can be improved by using model-based rewards transferred from tasks that directly compare the similarity of sentence pairs.

IMAGE CAPTIONING POLICY GRADIENT METHODS SEMANTIC TEXTUAL SIMILARITY TEXT GENERATION TRANSFER LEARNING

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

7 Sep 2019

Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively.

POLICY GRADIENT METHODS Q-LEARNING

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

29 Aug 2019

In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate.

POLICY GRADIENT METHODS

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

8 Aug 2019

This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods.

POLICY GRADIENT METHODS

Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

2 Aug 2019

This paper proposes a definition of system health in the context of multiple agents optimizing a joint reward function.

MULTI-AGENT REINFORCEMENT LEARNING POLICY GRADIENT METHODS

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

1 Aug 2019

However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution (say with a sufficiently rich policy class); how they cope with approximation error due to using a restricted class of parametric policies; or their finite sample behavior.

POLICY GRADIENT METHODS

Hindsight Trust Region Policy Optimization

29 Jul 2019

Motivated by the demand for an effective deep reinforcement learning algorithm that accommodates sparse reward environment, this paper presents Hindsight Trust Region Policy Optimization (Hindsight TRPO), a method that efficiently utilizes interactions in sparse reward conditions and maintains learning stability by restricting variance during the policy update process.

POLICY GRADIENT METHODS

Variance Reduction in Actor Critic Methods (ACM)

23 Jul 2019

After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators.

POLICY GRADIENT METHODS

Entropic Risk Measure in Policy Search

21 Jun 2019

With the increasing pace of automation, modern robotic systems need to act in stochastic, non-stationary, partially observable environments.

POLICY GRADIENT METHODS

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

19 Jun 2019

Under a further strict saddle points assumption, this result establishes convergence to essentially locally optimal policies of the underlying problem, and thus bridges the gap in existing literature on the convergence of PG methods.

AUTONOMOUS DRIVING POLICY GRADIENT METHODS