Policy Gradient Methods
89 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsLatest papers with no code
Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate
The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization.
When Do Off-Policy and On-Policy Policy Gradient Methods Align?
A well-established off-policy objective is the excursion objective.
Identifying Policy Gradient Subspaces
Policy gradient methods hold great potential for solving complex continuous control tasks.
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction
Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning.
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property
Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning.
Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains
To tackle this challenge, we propose a game-theoretic, privacy-preserving mechanism, utilizing a secure multi-party computation (MPC) framework in MARL settings.
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation
Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrated to improve the 2D diffusion models using an aesthetic scoring function.
Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems
As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers.
A Large Deviations Perspective on Policy Gradient Algorithms
Motivated by policy gradient methods in the context of reinforcement learning, we derive the first large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Lojasiewicz condition.
On the Second-Order Convergence of Biased Policy Gradient Algorithms
Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points.