no code implementations • 8 Aug 2023 • William Black, Ercument Ilhan, Andrea Marchini, Vilda Markeviciute
This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale.
2 code implementations • 17 Apr 2021 • Ercument Ilhan, Jeremy Gow, Diego Perez-Liebana
Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm to alleviate the sample inefficiency problem in deep reinforcement learning.
1 code implementation • 17 Apr 2021 • Ercument Ilhan, Jeremy Gow, Diego Perez-Liebana
However, due to the realistic concerns, the number of these interactions is limited with a budget; therefore, it is crucial to perform these in the most appropriate moments.
1 code implementation • 1 Oct 2020 • Ercument Ilhan, Jeremy Gow, Diego Perez-Liebana
Action advising is a budget-constrained knowledge exchange mechanism between teacher-student peers that can help tackle exploration and sample inefficiency problems in deep reinforcement learning (RL).