Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

7 Nov 2016Shixiang GuTimothy LillicrapZoubin GhahramaniRichard E. TurnerSergey Levine

Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity... (read more)

PDF Abstract

Evaluation Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.