no code implementations • 23 Jun 2023 • Fengdi Che, Gautham Vasan, A. Rupam Mahmood
The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the \emph{discounted stationary distribution}.
no code implementations • 1 Oct 2022 • Fengdi Che, Xiru Zhu, Doina Precup, David Meger, Gregory Dudek
Guided exploration with expert demonstrations improves data efficiency for reinforcement learning, but current algorithms often overuse expert information.
no code implementations • 2 Dec 2019 • Xiru Zhu, Fengdi Che, Tianzi Yang, Tzuyang Yu, David Meger, Gregory Dudek
This is because the task of evaluating the quality of a generated image differs from deciding if an image is real or fake.