# All-Action Policy Gradient Methods: A Numerical Integration Approach

21 Oct 2019Benjamin PetitLoren Amdahl-CulletonYao LiuJimmy SmithPierre-Luc Bacon

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]... (read more)

