no code implementations • 14 Jun 2022 • Maxim Kaledin, Alexander Golubev, Denis Belomestny
Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate.