no code implementations • 27 Sep 2023 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting.
no code implementations • 31 Mar 2023 • Danil Provodin, Jérémie Joudioux, Eduard Duryev
Although the bandits framework is a classical and well-suited approach for optimal bidding strategies in sponsored search auctions, industrial attempts are rarely documented.
1 code implementation • 8 Sep 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We study a posterior sampling approach to efficient exploration in constrained reinforcement learning.
1 code implementation • 14 Feb 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior.
1 code implementation • 3 Nov 2021 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We consider a special case of bandit problems, namely batched bandits.