no code implementations • 8 Mar 2022 • Ashok Cutkosky, Chris Dann, Abhimanyu Das, Qiuyi, Zhang
We study the setting of optimizing with bandit feedback with additional prior knowledge provided to the learner in the form of an initial hint of the optimal action.