no code implementations • 28 Nov 2020 • Priyank Agrawal, Theja Tulabandhula, Vashist Avadhanula
In this paper, we propose an optimistic algorithm and show that the regret is bounded by $O(\sqrt{dT} + \kappa)$, significantly improving the performance over existing methods.
no code implementations • 23 Oct 2020 • Priyank Agrawal, Jinglin Chen, Nan Jiang
This paper studies regret minimization with randomized value functions in reinforcement learning.
no code implementations • 18 Jun 2020 • Priyank Agrawal, Theja Tulabandhula
We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting.
no code implementations • 22 Jan 2020 • Priyank Agrawal, Theja Tulabandhula
We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users.
no code implementations • 22 Nov 2018 • Priyank Agrawal, Theja Tulabandhula
We study the effect of impairment on stochastic multi-armed bandits and develop new ways to mitigate it.