However, these methods are notorious for the enormous amount of required training data which is prohibitively expensive to collect on real robots.
The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data.
This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators.
Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap.
Optimizing a policy on a slightly faulty simulator can easily lead to the maximization of the `Simulation Optimization Bias` (SOB).