no code implementations • 20 May 2023 • Naman Saxena, Subhojyoti Khastigir, Shishir Kolathaya, Shalabh Bhatnagar
In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion.