Search Results for author: Assaf Zeevi

Found 22 papers, 3 papers with code

A Doubly Robust Approach to Sparse Reinforcement Learning

no code implementations23 Oct 2023 Wonyoung Kim, Garud Iyengar, Assaf Zeevi

We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features.

reinforcement-learning

Bayesian Design Principles for Frequentist Sequential Learning

1 code implementation1 Oct 2023 Yunbei Xu, Assaf Zeevi

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles.

Multi-Armed Bandits reinforcement-learning

Last Switch Dependent Bandits with Monotone Payoff Functions

no code implementations1 Jun 2023 Ayoub Foussoul, Vineet Goyal, Orestis Papadigenopoulos, Assaf Zeevi

In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment.

Pareto Front Identification with Regret Minimization

no code implementations31 May 2023 Wonyoung Kim, Garud Iyengar, Assaf Zeevi

The sample complexity of our proposed algorithm is $\tilde{O}(d/\Delta^2)$, where $d$ is the dimension of contexts and $\Delta$ is a measure of problem complexity.

Active Learning

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

no code implementations31 Jan 2023 Wonyoung Kim, Garud Iyengar, Assaf Zeevi

We consider the linear contextual multi-class multi-period packing problem (LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible.

Management Multi-Armed Bandits

Complexity Analysis of a Countable-armed Bandit Problem

no code implementations18 Jan 2023 Anand Kalvit, Assaf Zeevi

We also show that the instance-independent (minimax) regret is $\tilde{\mathcal{O}}\left( \sqrt{n} \right)$ when $K=2$.

Online Allocation and Learning in the Presence of Strategic Agents

no code implementations25 Sep 2022 Steven Yin, Shipra Agrawal, Assaf Zeevi

We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them.

Bandits with Dynamic Arm-acquisition Costs

no code implementations23 Oct 2021 Anand Kalvit, Assaf Zeevi

We consider a bandit problem where at any time, the decision maker can add new arms to her consideration set.

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

no code implementations NeurIPS 2021 Anand Kalvit, Assaf Zeevi

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap.

Thompson Sampling

From Finite to Countable-Armed Bandits

no code implementations NeurIPS 2020 Anand Kalvit, Assaf Zeevi

We consider a stochastic bandit problem with countably many arms that belong to a finite set of types, each characterized by a unique mean reward.

Dynamic Pricing and Learning under the Bass Model

no code implementations9 Mar 2021 Shipra Agrawal, Steven Yin, Assaf Zeevi

Equivalently, the goal is to minimize the regret which measures the revenue loss of the algorithm relative to the optimal expected revenue achievable under the stochastic Bass model with market size $m$ and time horizon $T$.

Learning to Stop with Surprisingly Few Samples

no code implementations19 Feb 2021 Daniel Russo, Assaf Zeevi, Tianyi Zhang

We consider a discounted infinite horizon optimal stopping problem.

Towards Problem-dependent Optimal Learning Rates

no code implementations NeurIPS 2020 Yunbei Xu, Assaf Zeevi

We study problem-dependent rates, i. e., generalization errors that scale tightly with the variance or the effective loss at the "best hypothesis."

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

no code implementations12 Nov 2020 Yunbei Xu, Assaf Zeevi

We introduce a principled framework dubbed "uniform localized convergence," and characterize sharp problem-dependent rates for central statistical learning problems.

Learning Theory Stochastic Optimization

Sparsity-Agnostic Lasso Bandit

1 code implementation16 Jul 2020 Min-hwan Oh, Garud Iyengar, Assaf Zeevi

We consider a stochastic contextual bandit problem where the dimension $d$ of the feature vectors is potentially large, however, only a sparse subset of features of cardinality $s_0 \ll d$ affect the reward function.

Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits

no code implementations15 Jul 2020 Yunbei Xu, Assaf Zeevi

The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning.

counterfactual Multi-Armed Bandits +1

Discriminative Learning via Adaptive Questioning

no code implementations11 Apr 2020 Achal Bassamboo, Vikas Deep, Sandeep Juneja, Assaf Zeevi

We consider this problem from a fixed confidence-based $\delta$-correct framework, that in our setting seeks to arrive at the correct ability discrimination at the fastest possible rate while guaranteeing that the probability of error is less than a pre-specified and small $\delta$.

A General Framework for Bandit Problems Beyond Cumulative Objectives

no code implementations4 Jun 2018 Asaf Cassel, Shie Mannor, Assaf Zeevi

Unlike the case of cumulative criteria, in the problems we study here the oracle policy, that knows the problem parameters a priori and is used to "center" the regret, is not trivial.

Multi-Armed Bandits

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

no code implementations13 Jun 2017 Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling horizon of length $T$.

Thompson Sampling for the MNL-Bandit

no code implementations3 Jun 2017 Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none.

Thompson Sampling

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

no code implementations NeurIPS 2014 Omar Besbes, Yonatan Gur, Assaf Zeevi

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution.

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

1 code implementation13 May 2014 Omar Besbes, Yonatan Gur, Assaf Zeevi

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution.

Cannot find the paper you are looking for? You can Submit a new open access paper.