Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms

no code implementations21 Jul 2023 Khashayar Khosravi, Renato Paes Leme, Chara Podimata, Apostolis Tsorvantzis

We present online learning algorithms for any possible value of the evolution rate $\lambda$ and we show the robustness of our results to various model misspecifications.

Batched Neural Bandits

no code implementations25 Feb 2021 Quanquan Gu, Amin Karbasi, Khashayar Khosravi, Vahab Mirrokni, Dongruo Zhou

In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches.

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

1 code implementation NeurIPS 2020 Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with $k$ the number of arms and $T$ the time horizon, we consider the case where $k \geq \sqrt{T}$.

The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

2 code implementations24 Feb 2020 Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

This finding diverges from the notion of free exploration, which relates to covariate variation, as recently discussed in contextual bandit literature.

Non-Parametric Inference Adaptive to Intrinsic Dimension

1 code implementation11 Jan 2019 Khashayar Khosravi, Greg Lewis, Vasilis Syrgkanis

We show that if the intrinsic dimension of the covariate distribution is equal to $d$, then the finite sample estimation error of our estimator is of order $n^{-1/(d+2)}$ and our estimate is $n^{1/(d+2)}$-asymptotically normal, irrespective of $D$.

Matrix Completion Methods for Causal Panel Data Models

2 code implementations27 Oct 2017 Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, Khashayar Khosravi

In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations.

Mostly Exploration-Free Algorithms for Contextual Bandits

1 code implementation28 Apr 2017 Hamsa Bastani, Mohsen Bayati, Khashayar Khosravi

We prove that this algorithm is rate optimal without any additional assumptions on the context distribution or the number of arms.

