no code implementations • 15 Apr 2024 • Jiachun Li, David Simchi-Levi, Yining Wang
Contextual bandit with linear reward functions is among one of the most extensively studied models in bandit and online learning research.
no code implementations • 18 Feb 2024 • Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu
We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret.
no code implementations • 16 Jan 2024 • Jiachun Li, Kaining Shi, David Simchi-Levi
In this paper, we investigate the tradeoff between loss of social welfare and statistical power in contextual bandit experiment.
no code implementations • 28 Nov 2023 • Xi Chen, David Simchi-Levi, Yining Wang
This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints in scenarios with uncertain demand, achieving an optimal regret upper bound.
no code implementations • 10 Apr 2023 • David Simchi-Levi, Zeyu Zheng, Feng Zhu
A novel policy is proposed to characterize the optimal regret tail probability for any regret threshold.
no code implementations • 7 Jun 2022 • David Simchi-Levi, Zeyu Zheng, Feng Zhu
Starting from the two-armed bandit setting with time horizon $T$, we propose a simple policy and prove that the policy (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{T\ln T})$ and (ii) has the worst-case tail probability of incurring a linear regret decay at an exponential rate $\exp(-\Omega(\sqrt{T}))$, a rate that we prove to be best achievable for all worst-case optimal policies.
no code implementations • 16 Apr 2022 • Prem Talwai, David Simchi-Levi
We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales under general source conditions.
no code implementations • 21 Nov 2021 • Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu
This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL.
no code implementations • 1 Nov 2021 • N. Bora Keskin, David Simchi-Levi, Prem Talwai
The seller does not know the parameters of the products' linear demand model, and can dynamically adjust product prices to learn the demand model based on sales observations.
no code implementations • 28 Jun 2021 • David Simchi-Levi, Zeyu Zheng, Feng Zhu
Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases.
no code implementations • 16 May 2021 • Prem Talwai, Ali Shameli, David Simchi-Levi
We develop novel learning rates for conditional mean embeddings by applying the theory of interpolation for reproducing kernel Hilbert spaces (RKHS).
no code implementations • 7 Oct 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar
We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes.
no code implementations • 7 Oct 2020 • Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
no code implementations • 28 Sep 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Basar
We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes (MDPs).
no code implementations • 27 Sep 2020 • Xi Chen, David Simchi-Levi, Yining Wang
In this paper, we consider a dynamic pricing problem over $T$ time periods with an \emph{unknown} demand function of posted price and personalized information.
no code implementations • 30 Jun 2020 • Xiao-Yue Gong, David Simchi-Levi
Motivated by the episodic version of the classical inventory control problem, we propose a new Q-learning-based algorithm, Elimination-Based Half-Q-Learning (HQL), that enjoys improved efficiency over existing algorithms for a wide variety of problems in the one-sided-feedback setting.
no code implementations • ICML 2020 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i. e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets.
no code implementations • 2 May 2020 • David Simchi-Levi, Rui Sun, Huanan Zhang
We show that our learning algorithm can converge to the optimal algorithm that has access to the true demand functions, and we prove that the convergence rate is tight up to a certain logarithmic term.
no code implementations • 28 Mar 2020 • David Simchi-Levi, Yunzong Xu
We consider the general (stochastic) contextual bandit problem under the realizability assumption, i. e., the expected reward, as a function of contexts and actions, belongs to a general function class $\mathcal{F}$.
no code implementations • 4 Nov 2019 • David Simchi-Levi, Yunzong Xu, Jinglong Zhao
Our work reveals a surprising result: the optimal regret rate is completely characterized by a piecewise-constant function of the switching budget, which further depends on the number of resource constraints -- to the best of our knowledge, this is the first time the number of resources constraints is shown to play a fundamental role in determining the statistical complexity of online learning problems.
no code implementations • ICML 2020 • Jinzhi Bu, David Simchi-Levi, Yunzong Xu
We study a single-product dynamic pricing problem over a selling horizon of $T$ periods.
no code implementations • 7 Jun 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Notably, the interplay between endogeneity and exogeneity presents a unique challenge, absent in existing (stationary and non-stationary) stochastic online learning settings, when we apply the conventional Optimism in Face of Uncertainty principle to design algorithms with provably low dynamic regret for RL in drifting MDPs.
no code implementations • NeurIPS 2019 • David Simchi-Levi, Yunzong Xu
We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget.
no code implementations • 19 Mar 2019 • Rong Jin, David Simchi-Levi, Li Wang, Xinshang Wang, Sen Yang
In this paper, we study algorithms for dynamically identifying a large number of products (i. e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers' ultra-fast delivery platforms.
no code implementations • 4 Mar 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner.
no code implementations • 28 Feb 2019 • Hamsa Bastani, David Simchi-Levi, Ruihao Zhu
We study the problem of learning shared structure \emph{across} a sequence of dynamic pricing experiments for related products.
no code implementations • NeurIPS 2018 • Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang
Classically, the time complexity of a first-order method is estimated by its number of gradient computations.
no code implementations • NeurIPS 2018 • Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang
Classically, the time complexity of a first-order method is estimated by its number of gradient computations.
no code implementations • 11 Oct 2018 • Wang Chi Cheung, Will Ma, David Simchi-Levi, Xinshang Wang
We overcome both the challenges of model uncertainty and customer heterogeneity by judiciously synthesizing two algorithmic frameworks from the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive, and online learning, which shows how to "explore" the resource consumption distributions of each customer type under different actions.
no code implementations • 6 Oct 2018 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting.
no code implementations • 12 Sep 2017 • Yan Zhao, Xiao Fang, David Simchi-Levi
The problem of constructing such models from randomized experiments data is known as Uplift Modeling in the literature.
2 code implementations • 23 May 2017 • Yan Zhao, Xiao Fang, David Simchi-Levi
It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved.
no code implementations • 1 Apr 2017 • Wang Chi Cheung, David Simchi-Levi
We first propose an efficient online policy which incurs a regret $\tilde{O}(T^{2/3})$, where $T$ is the number of customers in the sales horizon.