no code implementations • NeurIPS 2008 • John W. Roberts, Russ Tedrake
Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the estimate of the gradient.