no code implementations • 19 Aug 2022 • Nika Haghtalab, Thodoris Lykouris, Sloan Nietert, Alex Wei
Although learning in Stackelberg games is well-understood when the agent is myopic, non-myopic agents pose additional complications.
no code implementations • 5 Jun 2022 • Daniel Freund, Thodoris Lykouris, Wentao Weng
We study decentralized multi-agent learning in bipartite queueing systems, a standard model for service systems.
no code implementations • NeurIPS 2021 • Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire
We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon.
1 code implementation • NeurIPS 2020 • Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We propose an algorithm for tabular episodic reinforcement learning with constraints.
no code implementations • ICML 2020 • Thodoris Lykouris, Vahab Mirrokni, Renato Paes Leme
We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component.
no code implementations • 26 Feb 2020 • Akshay Krishnamurthy, Thodoris Lykouris, Chara Podimata, Robert Schapire
We initiate the study of contextual search when some of the agents can behave in ways inconsistent with the underlying response model.
no code implementations • 20 Nov 2019 • Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.
no code implementations • 18 Sep 2019 • Avrim Blum, Thodoris Lykouris
We demonstrate that the task of satisfying this guarantee for multiple overlapping groups is not straightforward and show that for the simple objective of unweighted average of false negative and false positive rate, satisfying this for overlapping populations can be statistically impossible even when we are provided predictors that perform well separately on each subgroup.
no code implementations • 23 May 2019 • Thodoris Lykouris, Eva Tardos, Drishti Wali
We study the stochastic multi-armed bandit problem with the graph-based feedback structure introduced by Mannor and Shamir.
no code implementations • NeurIPS 2018 • Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nathan Srebro
We study the interplay between sequential decision making and avoiding discrimination against protected groups, when examples arrive online and do not follow distributional assumptions.
no code implementations • 25 Mar 2018 • Thodoris Lykouris, Vahab Mirrokni, Renato Paes Leme
We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be adversarially changed to trick the algorithm, e. g., click fraud, fake reviews and email spam.
no code implementations • ICML 2018 • Thodoris Lykouris, Sergei Vassilvitskii
Traditional online algorithms encapsulate decision making under uncertainty, and give ways to hedge against all possible future events, while guaranteeing a nearly optimal solution as compared to an offline optimum.
no code implementations • 9 Nov 2017 • Thodoris Lykouris, Karthik Sridharan, Eva Tardos
We develop a black-box approach for such problems where the learner observes as feedback only losses of a subset of the actions that includes the selected action.
no code implementations • NeurIPS 2016 • Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos
We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games.