no code implementations • 29 Sep 2021 • Yuankun Jiang, Chenglin Li, Wenrui Dai, Junni Zou, Hongkai Xiong
In this paper, we theoretically derive a bias-free and state/environment-dependent optimal baseline for DR, and analytically show its ability to achieve further variance reduction over the standard constant and state-dependent baselines for DR. We further propose a variance reduced domain randomization (VRDR) approach for policy gradient methods, to strike a tradeoff between the variance reduction and computational complexity in practice.
no code implementations • 1 Jan 2021 • Yuankun Jiang, Chenglin Li, Junni Zou, Wenrui Dai, Hongkai Xiong
To address this, in this paper, we propose a Bayesian linear regression with informative prior (IP-BLR) operator to leverage the data-dependent prior in the learning process of randomized value function, which can leverage the statistics of training results from previous iterations.
no code implementations • 1 Jan 2021 • Yuankun Jiang, Chenglin Li, Junni Zou, Wenrui Dai, Hongkai Xiong
To mitigate the model discrepancy between training and target (testing) environments, domain randomization (DR) can generate plenty of environments with a sufficient diversity by randomly sampling environment parameters in simulator.