no code implementations • 23 Oct 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi
We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.
1 code implementation • 1 Oct 2023 • Yunbei Xu, Assaf Zeevi
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles.
no code implementations • 1 Jun 2023 • Ayoub Foussoul, Vineet Goyal, Orestis Papadigenopoulos, Assaf Zeevi
In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment.
no code implementations • 31 May 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi
The sample complexity of our proposed algorithm is $\tilde{O}(d/\Delta^2)$, where $d$ is the dimension of contexts and $\Delta$ is a measure of problem complexity.
no code implementations • 31 Jan 2023 • Wonyoung Kim, Garud Iyengar, Assaf Zeevi
We consider the linear contextual multi-class multi-period packing problem (LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible.
no code implementations • 18 Jan 2023 • Anand Kalvit, Assaf Zeevi
We also show that the instance-independent (minimax) regret is $\tilde{\mathcal{O}}\left( \sqrt{n} \right)$ when $K=2$.
no code implementations • 25 Sep 2022 • Steven Yin, Shipra Agrawal, Assaf Zeevi
We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them.
no code implementations • 23 Oct 2021 • Anand Kalvit, Assaf Zeevi
We consider a bandit problem where at any time, the decision maker can add new arms to her consideration set.
no code implementations • NeurIPS 2021 • Anand Kalvit, Assaf Zeevi
One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap.
no code implementations • NeurIPS 2020 • Anand Kalvit, Assaf Zeevi
We consider a stochastic bandit problem with countably many arms that belong to a finite set of types, each characterized by a unique mean reward.
no code implementations • 9 Mar 2021 • Shipra Agrawal, Steven Yin, Assaf Zeevi
Equivalently, the goal is to minimize the regret which measures the revenue loss of the algorithm relative to the optimal expected revenue achievable under the stochastic Bass model with market size $m$ and time horizon $T$.
no code implementations • 19 Feb 2021 • Daniel Russo, Assaf Zeevi, Tianyi Zhang
We consider a discounted infinite horizon optimal stopping problem.
no code implementations • NeurIPS 2020 • Yunbei Xu, Assaf Zeevi
We study problem-dependent rates, i. e., generalization errors that scale tightly with the variance or the effective loss at the "best hypothesis."
no code implementations • 12 Nov 2020 • Yunbei Xu, Assaf Zeevi
We introduce a principled framework dubbed "uniform localized convergence," and characterize sharp problem-dependent rates for central statistical learning problems.
1 code implementation • 16 Jul 2020 • Min-hwan Oh, Garud Iyengar, Assaf Zeevi
We consider a stochastic contextual bandit problem where the dimension $d$ of the feature vectors is potentially large, however, only a sparse subset of features of cardinality $s_0 \ll d$ affect the reward function.
no code implementations • 15 Jul 2020 • Yunbei Xu, Assaf Zeevi
The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning.
no code implementations • 11 Apr 2020 • Achal Bassamboo, Vikas Deep, Sandeep Juneja, Assaf Zeevi
We consider this problem from a fixed confidence-based $\delta$-correct framework, that in our setting seeks to arrive at the correct ability discrimination at the fastest possible rate while guaranteeing that the probability of error is less than a pre-specified and small $\delta$.
no code implementations • 4 Jun 2018 • Asaf Cassel, Shie Mannor, Assaf Zeevi
Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.
no code implementations • 13 Jun 2017 • Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling horizon of length $T$.
no code implementations • 3 Jun 2017 • Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none.
no code implementations • NeurIPS 2014 • Omar Besbes, Yonatan Gur, Assaf Zeevi
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution.
1 code implementation • 13 May 2014 • Omar Besbes, Yonatan Gur, Assaf Zeevi
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution.