Search Results for author: David Simchi-Levi

Found 33 papers, 1 papers with code

On the Optimal Regret of Locally Private Linear Contextual Bandit

no code implementations15 Apr 2024 Jiachun Li, David Simchi-Levi, Yining Wang

Contextual bandit with linear reward functions is among one of the most extensively studied models in bandit and online learning research.

Online Local False Discovery Rate Control: A Resource Allocation Approach

no code implementations18 Feb 2024 Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu

We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret.

Privacy Preserving Adaptive Experiment Design

no code implementations16 Jan 2024 Jiachun Li, Kaining Shi, David Simchi-Levi

In this paper, we investigate the tradeoff between loss of social welfare and statistical power in contextual bandit experiment.

Privacy Preserving

Utility Fairness in Contextual Dynamic Pricing with Demand Learning

no code implementations28 Nov 2023 Xi Chen, David Simchi-Levi, Yining Wang

This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints in scenarios with uncertain demand, achieving an optimal regret upper bound.

Fairness

Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

no code implementations10 Apr 2023 David Simchi-Levi, Zeyu Zheng, Feng Zhu

A novel policy is proposed to characterize the optimal regret tail probability for any regret threshold.

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Stochastic Bandits

no code implementations7 Jun 2022 David Simchi-Levi, Zeyu Zheng, Feng Zhu

Starting from the two-armed bandit setting with time horizon $T$, we propose a simple policy and prove that the policy (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{T\ln T})$ and (ii) has the worst-case tail probability of incurring a linear regret decay at an exponential rate $\exp(-\Omega(\sqrt{T}))$, a rate that we prove to be best achievable for all worst-case optimal policies.

Multi-Armed Bandits Thompson Sampling

Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition

no code implementations16 Apr 2022 Prem Talwai, David Simchi-Levi

We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales under general source conditions.

regression

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

no code implementations21 Nov 2021 Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu

This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL.

Decision Making Offline RL +2

Dynamic Pricing and Demand Learning on a Large Network of Products: A PAC-Bayesian Approach

no code implementations1 Nov 2021 N. Bora Keskin, David Simchi-Levi, Prem Talwai

The seller does not know the parameters of the products' linear demand model, and can dynamically adjust product prices to learn the demand model based on sales observations.

Offline Planning and Online Learning under Recovering Rewards

no code implementations28 Jun 2021 David Simchi-Levi, Zeyu Zheng, Feng Zhu

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases.

Sobolev Norm Learning Rates for Conditional Mean Embeddings

no code implementations16 May 2021 Prem Talwai, Ali Shameli, David Simchi-Levi

We develop novel learning rates for conditional mean embeddings by applying the theory of interpolation for reproducing kernel Hilbert spaces (RKHS).

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

no code implementations7 Oct 2020 Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.

Active Learning Multi-Armed Bandits +2

Privacy-Preserving Dynamic Personalized Pricing with Demand Learning

no code implementations27 Sep 2020 Xi Chen, David Simchi-Levi, Yining Wang

In this paper, we consider a dynamic pricing problem over $T$ time periods with an \emph{unknown} demand function of posted price and personalized information.

Privacy Preserving

Provably More Efficient Q-Learning in the One-Sided-Feedback/Full-Feedback Settings

no code implementations30 Jun 2020 Xiao-Yue Gong, David Simchi-Levi

Motivated by the episodic version of the classical inventory control problem, we propose a new Q-learning-based algorithm, Elimination-Based Half-Q-Learning (HQL), that enjoys improved efficiency over existing algorithms for a wide variety of problems in the one-sided-feedback setting.

Q-Learning

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

no code implementations ICML 2020 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i. e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets.

reinforcement-learning Reinforcement Learning (RL)

Online Learning and Optimization for Revenue Management Problems with Add-on Discounts

no code implementations2 May 2020 David Simchi-Levi, Rui Sun, Huanan Zhang

We show that our learning algorithm can converge to the optimal algorithm that has access to the true demand functions, and we prove that the convergence rate is tight up to a certain logarithmic term.

Management

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

no code implementations28 Mar 2020 David Simchi-Levi, Yunzong Xu

We consider the general (stochastic) contextual bandit problem under the realizability assumption, i. e., the expected reward, as a function of contexts and actions, belongs to a general function class $\mathcal{F}$.

Multi-Armed Bandits regression

Blind Network Revenue Management and Bandits with Knapsacks under Limited Switches

no code implementations4 Nov 2019 David Simchi-Levi, Yunzong Xu, Jinglong Zhao

Our work reveals a surprising result: the optimal regret rate is completely characterized by a piecewise-constant function of the switching budget, which further depends on the number of resource constraints -- to the best of our knowledge, this is the first time the number of resources constraints is shown to play a fundamental role in determining the statistical complexity of online learning problems.

Decision Making Management

Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

no code implementations7 Jun 2019 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Notably, the interplay between endogeneity and exogeneity presents a unique challenge, absent in existing (stationary and non-stationary) stochastic online learning settings, when we apply the conventional Optimism in Face of Uncertainty principle to design algorithms with provably low dynamic regret for RL in drifting MDPs.

Decision Making reinforcement-learning +1

Phase Transitions in Bandits with Switching Constraints

no code implementations NeurIPS 2019 David Simchi-Levi, Yunzong Xu

We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget.

Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

no code implementations19 Mar 2019 Rong Jin, David Simchi-Levi, Li Wang, Xinshang Wang, Sen Yang

In this paper, we study algorithms for dynamically identifying a large number of products (i. e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers' ultra-fast delivery platforms.

Hedging the Drift: Learning to Optimize under Non-Stationarity

no code implementations4 Mar 2019 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner.

Decision Making

Meta Dynamic Pricing: Transfer Learning Across Experiments

no code implementations28 Feb 2019 Hamsa Bastani, David Simchi-Levi, Ruihao Zhu

We study the problem of learning shared structure \emph{across} a sequence of dynamic pricing experiments for related products.

Thompson Sampling Transfer Learning

The Lingering of Gradients: Theory and Applications

no code implementations NeurIPS 2018 Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

Classically, the time complexity of a first-order method is estimated by its number of gradient computations.

Management

The Lingering of Gradients: How to Reuse Gradients Over Time

no code implementations NeurIPS 2018 Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

Classically, the time complexity of a first-order method is estimated by its number of gradient computations.

Management

Inventory Balancing with Online Learning

no code implementations11 Oct 2018 Wang Chi Cheung, Will Ma, David Simchi-Levi, Xinshang Wang

We overcome both the challenges of model uncertainty and customer heterogeneity by judiciously synthesizing two algorithmic frameworks from the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive, and online learning, which shows how to "explore" the resource consumption distributions of each customer type under different actions.

Learning to Optimize under Non-Stationarity

no code implementations6 Oct 2018 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting.

A Practically Competitive and Provably Consistent Algorithm for Uplift Modeling

no code implementations12 Sep 2017 Yan Zhao, Xiao Fang, David Simchi-Levi

The problem of constructing such models from randomized experiments data is known as Uplift Modeling in the literature.

Decision Making

Uplift Modeling with Multiple Treatments and General Response Types

2 code implementations23 May 2017 Yan Zhao, Xiao Fang, David Simchi-Levi

It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved.

Decision Making

Assortment Optimization under Unknown MultiNomial Logit Choice Models

no code implementations1 Apr 2017 Wang Chi Cheung, David Simchi-Levi

We first propose an efficient online policy which incurs a regret $\tilde{O}(T^{2/3})$, where $T$ is the number of customers in the sales horizon.

Cannot find the paper you are looking for? You can Submit a new open access paper.