1 code implementation • ICLR 2021 • Benjamin Eysenbach, Swapnil Asawa, Shreyas Chaudhari, Sergey Levine, Ruslan Salakhutdinov
Building off of a probabilistic view of RL, we formally show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.