Policy Gradient Methods

89 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Policy Gradient Methods models and implementations
2 papers
1,155
2 papers
614
See all 7 libraries.

Latest papers with no code

Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate

no code yet • 1 Mar 2024

The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization.

When Do Off-Policy and On-Policy Policy Gradient Methods Align?

no code yet • 19 Feb 2024

A well-established off-policy objective is the excursion objective.

Identifying Policy Gradient Subspaces

no code yet • 12 Jan 2024

Policy gradient methods hold great potential for solving complex continuous control tasks.

Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

no code yet • 2 Jan 2024

Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning.

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

no code yet • 19 Dec 2023

Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning.

Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains

no code yet • 9 Dec 2023

To tackle this challenge, we propose a game-theoretic, privacy-preserving mechanism, utilizing a secure multi-party computation (MPC) framework in MARL settings.

RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation

no code yet • 8 Dec 2023

Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrated to improve the 2D diffusion models using an aesthetic scoring function.

Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems

no code yet • 5 Dec 2023

As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers.

A Large Deviations Perspective on Policy Gradient Algorithms

no code yet • 13 Nov 2023

Motivated by policy gradient methods in the context of reinforcement learning, we derive the first large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Lojasiewicz condition.

On the Second-Order Convergence of Biased Policy Gradient Algorithms

no code yet • 5 Nov 2023

Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points.