Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

30 Oct 2018Mahammad HumayooXueqi Cheng

Off-policy learning is more unstable compared to on-policy learning in reinforcement learning (RL). One reason for the instability of off-policy learning is a discrepancy between the target ($\pi$) and behavior (b) policy distributions... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.