Search Results for author: Goran Radanović

Found 3 papers, 0 papers with code

Corruption-Robust Offline Two-Player Zero-Sum Markov Games

no code implementations4 Mar 2024 Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović

We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

no code implementations4 Mar 2024 Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO.

Cannot find the paper you are looking for? You can Submit a new open access paper.