208 papers with code • 73 benchmarks • 7 datasets
Integrating model-free and model-based approaches in reinforcement learning has the potential to achieve the high performance of model-free algorithms with low sample complexity.
When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.
In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.
In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that uses standard supervised learning methods as subroutines.
Ranked #1 on OpenAI Gym on HalfCheetah-v2
Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.