Ensembles of neural networks are often used to estimate epistemic uncertainty in high-dimensional problems because of their scalability and ease of use.
When designing controllers for safety-critical systems, practitioners often face a challenging tradeoff between robustness and performance.
A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure).
The Deep Q-Network proposed by Mnih et al.  has become a benchmark and building point for much deep reinforcement learning research.
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning.
Ranked #1 on Continuous Control on Lunar Lander (OpenAI Gym)