no code implementations • 1 Jan 2021 • Egor Rotinov
This paper introduces the off-policy reinforcement learning method that uses the Hellinger distance between sampling policy and current policy as a constraint.
no code implementations • 19 Oct 2019 • Egor Rotinov
This paper describes an improvement in Deep Q-learning called Reverse Experience Replay (also RER) that solves the problem of sparse rewards and helps to deal with reward maximizing tasks by sampling transitions successively in reverse order.