1 code implementation • 20 Sep 2022 • Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, Josiah P. Hanna
The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w. r. t the baseline policy during training, and \textbf{(b)} eventually surpassing the baseline performance.