Search Results for author: David Simchi-Levi

Found 33 papers, 1 papers with code

On the Optimal Regret of Locally Private Linear Contextual Bandit

no code implementations • 15 Apr 2024 • Jiachun Li, David Simchi-Levi, Yining Wang

Contextual bandit with linear reward functions is among one of the most extensively studied models in bandit and online learning research.

Paper
Add Code

Online Local False Discovery Rate Control: A Resource Allocation Approach

no code implementations • 18 Feb 2024 • Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu

We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret.

Paper
Add Code

Privacy Preserving Adaptive Experiment Design

no code implementations • 16 Jan 2024 • Jiachun Li, Kaining Shi, David Simchi-Levi

In this paper, we investigate the tradeoff between loss of social welfare and statistical power in contextual bandit experiment.

Privacy Preserving

Paper
Add Code

Utility Fairness in Contextual Dynamic Pricing with Demand Learning

no code implementations • 28 Nov 2023 • Xi Chen, David Simchi-Levi, Yining Wang

This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints in scenarios with uncertain demand, achieving an optimal regret upper bound.

Fairness

Paper
Add Code

Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

no code implementations • 10 Apr 2023 • David Simchi-Levi, Zeyu Zheng, Feng Zhu

A novel policy is proposed to characterize the optimal regret tail probability for any regret threshold.

Paper
Add Code

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Stochastic Bandits

no code implementations • 7 Jun 2022 • David Simchi-Levi, Zeyu Zheng, Feng Zhu

Starting from the two-armed bandit setting with time horizon $T$, we propose a simple policy and prove that the policy (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{T\ln T})$ and (ii) has the worst-case tail probability of incurring a linear regret decay at an exponential rate $\exp(-\Omega(\sqrt{T}))$, a rate that we prove to be best achievable for all worst-case optimal policies.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition

no code implementations • 16 Apr 2022 • Prem Talwai, David Simchi-Levi

We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales under general source conditions.

regression

Paper
Add Code

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

no code implementations • 21 Nov 2021 • Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu

This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL.

Decision Making Offline RL +2

Paper
Add Code

Dynamic Pricing and Demand Learning on a Large Network of Products: A PAC-Bayesian Approach

no code implementations • 1 Nov 2021 • N. Bora Keskin, David Simchi-Levi, Prem Talwai

The seller does not know the parameters of the products' linear demand model, and can dynamically adjust product prices to learn the demand model based on sales observations.

Paper
Add Code

Offline Planning and Online Learning under Recovering Rewards

no code implementations • 28 Jun 2021 • David Simchi-Levi, Zeyu Zheng, Feng Zhu

Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases.

Paper
Add Code

Sobolev Norm Learning Rates for Conditional Mean Embeddings

no code implementations • 16 May 2021 • Prem Talwai, Ali Shameli, David Simchi-Levi

We develop novel learning rates for conditional mean embeddings by applying the theory of interpolation for reproducing kernel Hilbert spaces (RKHS).

Paper
Add Code

Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control

no code implementations • 7 Oct 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes.

Computational Efficiency Q-Learning +1

Paper
Add Code

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

no code implementations • 7 Oct 2020 • Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.

Active Learning Multi-Armed Bandits +2

Paper
Add Code

Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

no code implementations • 28 Sep 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Basar

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes (MDPs).

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Privacy-Preserving Dynamic Personalized Pricing with Demand Learning

no code implementations • 27 Sep 2020 • Xi Chen, David Simchi-Levi, Yining Wang

In this paper, we consider a dynamic pricing problem over $T$ time periods with an \emph{unknown} demand function of posted price and personalized information.

Privacy Preserving

Paper
Add Code

Provably More Efficient Q-Learning in the One-Sided-Feedback/Full-Feedback Settings

no code implementations • 30 Jun 2020 • Xiao-Yue Gong, David Simchi-Levi

Motivated by the episodic version of the classical inventory control problem, we propose a new Q-learning-based algorithm, Elimination-Based Half-Q-Learning (HQL), that enjoys improved efficiency over existing algorithms for a wide variety of problems in the one-sided-feedback setting.

Q-Learning

Paper
Add Code

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

no code implementations • ICML 2020 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i. e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Online Learning and Optimization for Revenue Management Problems with Add-on Discounts

no code implementations • 2 May 2020 • David Simchi-Levi, Rui Sun, Huanan Zhang

We show that our learning algorithm can converge to the optimal algorithm that has access to the true demand functions, and we prove that the convergence rate is tight up to a certain logarithmic term.

Management

Paper
Add Code

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

no code implementations • 28 Mar 2020 • David Simchi-Levi, Yunzong Xu

We consider the general (stochastic) contextual bandit problem under the realizability assumption, i. e., the expected reward, as a function of contexts and actions, belongs to a general function class $\mathcal{F}$.

Multi-Armed Bandits regression

Paper
Add Code

Blind Network Revenue Management and Bandits with Knapsacks under Limited Switches

no code implementations • 4 Nov 2019 • David Simchi-Levi, Yunzong Xu, Jinglong Zhao

Our work reveals a surprising result: the optimal regret rate is completely characterized by a piecewise-constant function of the switching budget, which further depends on the number of resource constraints -- to the best of our knowledge, this is the first time the number of resources constraints is shown to play a fundamental role in determining the statistical complexity of online learning problems.

Decision Making Management

Paper
Add Code

Online Pricing with Offline Data: Phase Transition and Inverse Square Law

no code implementations • ICML 2020 • Jinzhi Bu, David Simchi-Levi, Yunzong Xu

We study a single-product dynamic pricing problem over a selling horizon of $T$ periods.

Paper
Add Code

Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

no code implementations • 7 Jun 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Notably, the interplay between endogeneity and exogeneity presents a unique challenge, absent in existing (stationary and non-stationary) stochastic online learning settings, when we apply the conventional Optimism in Face of Uncertainty principle to design algorithms with provably low dynamic regret for RL in drifting MDPs.

Decision Making reinforcement-learning +1

Paper
Add Code

Phase Transitions in Bandits with Switching Constraints

no code implementations • NeurIPS 2019 • David Simchi-Levi, Yunzong Xu

We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget.

Paper
Add Code

Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

no code implementations • 19 Mar 2019 • Rong Jin, David Simchi-Levi, Li Wang, Xinshang Wang, Sen Yang

In this paper, we study algorithms for dynamically identifying a large number of products (i. e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers' ultra-fast delivery platforms.

Paper
Add Code

Hedging the Drift: Learning to Optimize under Non-Stationarity

no code implementations • 4 Mar 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner.

Decision Making

Paper
Add Code

Meta Dynamic Pricing: Transfer Learning Across Experiments

no code implementations • 28 Feb 2019 • Hamsa Bastani, David Simchi-Levi, Ruihao Zhu

We study the problem of learning shared structure \emph{across} a sequence of dynamic pricing experiments for related products.

Thompson Sampling Transfer Learning

Paper
Add Code

The Lingering of Gradients: Theory and Applications

no code implementations • NeurIPS 2018 • Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

Classically, the time complexity of a first-order method is estimated by its number of gradient computations.

Management

Paper
Add Code

The Lingering of Gradients: How to Reuse Gradients Over Time

no code implementations • NeurIPS 2018 • Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

Classically, the time complexity of a first-order method is estimated by its number of gradient computations.

Management

Paper
Add Code

Inventory Balancing with Online Learning

no code implementations • 11 Oct 2018 • Wang Chi Cheung, Will Ma, David Simchi-Levi, Xinshang Wang

We overcome both the challenges of model uncertainty and customer heterogeneity by judiciously synthesizing two algorithmic frameworks from the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive, and online learning, which shows how to "explore" the resource consumption distributions of each customer type under different actions.

Paper
Add Code

Learning to Optimize under Non-Stationarity

no code implementations • 6 Oct 2018 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting.

Paper
Add Code

A Practically Competitive and Provably Consistent Algorithm for Uplift Modeling

no code implementations • 12 Sep 2017 • Yan Zhao, Xiao Fang, David Simchi-Levi

The problem of constructing such models from randomized experiments data is known as Uplift Modeling in the literature.

Decision Making

Paper
Add Code

Uplift Modeling with Multiple Treatments and General Response Types

2 code implementations • 23 May 2017 • Yan Zhao, Xiao Fang, David Simchi-Levi

It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved.

Decision Making

Paper
Code

Assortment Optimization under Unknown MultiNomial Logit Choice Models

no code implementations • 1 Apr 2017 • Wang Chi Cheung, David Simchi-Levi

We first propose an efficient online policy which incurs a regret $\tilde{O}(T^{2/3})$, where $T$ is the number of customers in the sales horizon.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.