no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović
We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.
no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla
Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO.
no code implementations • 9 Feb 2024 • Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović
We aim to design algorithms that identify a near-optimal policy from the corrupted data, with provable guarantees.