no code implementations • 3 Nov 2022 • Gal Leibovich, Guy Jacob, Or Avner, Gal Novik, Aviv Tamar
The key challenge is a $\textit{distribution shift}$ between the desired outputs and the outputs of an initial random guess, and we prove that iterative inversion can steer the learning correctly, under rather strict conditions on the function.
no code implementations • 1 Nov 2021 • Gal Leibovich, Guy Jacob, Shadi Endrawis, Gal Novik, Aviv Tamar
We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data.
no code implementations • 10 May 2021 • Shadi Endrawis, Gal Leibovich, Guy Jacob, Gal Novik, Aviv Tamar
In this work, we propose that data collection policies should actively explore the environment to collect diverse data.