MuJoCo
278 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find MuJoCo models and implementationsMost implemented papers
Simple random search provides a competitive approach to reinforcement learning
A common belief in model-free reinforcement learning is that methods based on random search in the parameter space of policies exhibit significantly worse sample complexity than those that explore the space of actions.
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients.
The StarCraft Multi-Agent Challenge
In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
In this paper, we extend the theory of trust region learning to MARL.
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance.
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.
DeepMind Control Suite
The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents.
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
robosuite is a simulation framework for robot learning powered by the MuJoCo physics engine.
Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks.
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
Theoretically, we show that SQIL can be interpreted as a regularized variant of BC that uses a sparsity prior to encourage long-horizon imitation.