no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović
We note that we are the first to provide such a characterization of the problem of learning approximate Nash Equilibrium policies in offline two-player zero-sum Markov games under data corruption.
no code implementations • 4 Mar 2024 • Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla
Moreover, we extend our analysis to the approximate optimization setting and derive exponentially decaying convergence rates for both RLHF and DPO.
no code implementations • 9 Feb 2024 • Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović
We aim to design algorithms that identify a near-optimal policy from the corrupted data, with provable guarantees.
no code implementations • 5 Oct 2021 • Andi Nika, Sepehr Elahi, Cem Tekin
We consider a contextual bandit problem with a combinatorial action set and time-varying base arm availability.
1 code implementation • 28 Aug 2020 • Andi Nika, Sepehr Elahi, Cem Tekin
We consider contextual combinatorial volatile multi-armed bandit (CCV-MAB), in which at each round, the learner observes a set of available base arms and their contexts, and then, selects a super arm that contains $K$ base arms in order to maximize its cumulative reward.
1 code implementation • 24 Jun 2020 • Andi Nika, Kerem Bozgan, Sepehr Elahi, Çağın Ararat, Cem Tekin
We consider the problem of optimizing a vector-valued objective function $\boldsymbol{f}$ sampled from a Gaussian Process (GP) whose index set is a well-behaved, compact metric space $({\cal X}, d)$ of designs.