no code implementations • 1 Jun 2023 • Ayoub Foussoul, Vineet Goyal, Orestis Papadigenopoulos, Assaf Zeevi
In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment.
no code implementations • 29 May 2022 • Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai
Even assuming prior knowledge of the mean payoff functions, computing an optimal planning in the above model is NP-hard, while the state-of-the-art is a $1/4$-approximation algorithm for the case where at most one arm can be played per round.
no code implementations • 26 May 2022 • Alexia Atsidakou, Constantine Caramanis, Evangelia Gergatsouli, Orestis Papadigenopoulos, Christos Tzamos
Pandora's Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative while minimizing the search cost of exploring the value of each alternative.
no code implementations • 22 May 2021 • Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu, Constantine Caramanis, Sanjay Shakkottai
Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling.
no code implementations • NeurIPS 2021 • Orestis Papadigenopoulos, Constantine Caramanis
A recent line of research focuses on the study of stochastic multi-armed bandits (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the arms.
no code implementations • NeurIPS 2021 • Orestis Papadigenopoulos, Constantine Caramanis
A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the arms (Kleinberg and Immorlica [FOCS18], Basu et al. [NeurIPS19]).
no code implementations • 6 Mar 2020 • Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai
Assuming knowledge of the context distribution and the mean reward of each arm-context pair, we cast the problem as an online bipartite matching problem, where the right-vertices (contexts) arrive stochastically and the left-vertices (arms) are blocked for a finite number of rounds each time they are matched.