Search Results for author: Wang Chi Cheung

Found 15 papers, 2 papers with code

Best Arm Identification with Resource Constraints

no code implementations29 Feb 2024 Zitian Li, Wang Chi Cheung

Motivated by the cost heterogeneity in experimentation across different alternatives, we study the Best Arm Identification with Resource Constraints (BAIwRC) problem.

Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands

no code implementations8 Feb 2023 Lixing Lyu, Wang Chi Cheung

Finally, we adapt our model to a network revenue management problem, and numerically demonstrate that our algorithm can still performs competitively compared to existing baselines.

Management

Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits

no code implementations16 Oct 2021 Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon.

Multi-Armed Bandits

Reinforcement Learning with Ex-Post Max-Min Fairness

no code implementations29 Sep 2021 Wang Chi Cheung, Zi Yi Ewe

We consider reinforcement learning with vectorial rewards, where the agent receives a vector of $K\geq 2$ different types of rewards at each time step.

Fairness reinforcement-learning +1

Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions

1 code implementation15 Oct 2020 Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

When the amount of corruptions per step (CPS) is below a threshold, PSS($u$) identifies the best arm or item with probability tending to $1$ as $T\rightarrow \infty$.

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

no code implementations ICML 2020 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i. e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets.

reinforcement-learning Reinforcement Learning (RL)

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

no code implementations ICML 2020 Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

Finally, extensive numerical simulations corroborate the efficacy of CascadeBAI as well as the tightness of our upper bound on its time complexity.

Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

no code implementations7 Jun 2019 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Notably, the interplay between endogeneity and exogeneity presents a unique challenge, absent in existing (stationary and non-stationary) stochastic online learning settings, when we apply the conventional Optimism in Face of Uncertainty principle to design algorithms with provably low dynamic regret for RL in drifting MDPs.

Decision Making reinforcement-learning +1

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

no code implementations15 May 2019 Wang Chi Cheung

In our general setting where a stationary policy could have multiple recurrent classes, the agent faces a subtle yet consequential trade-off in alternating among different actions for balancing the vectorial outcomes.

Hedging the Drift: Learning to Optimize under Non-Stationarity

no code implementations4 Mar 2019 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner.

Decision Making

Inventory Balancing with Online Learning

no code implementations11 Oct 2018 Wang Chi Cheung, Will Ma, David Simchi-Levi, Xinshang Wang

We overcome both the challenges of model uncertainty and customer heterogeneity by judiciously synthesizing two algorithmic frameworks from the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive, and online learning, which shows how to "explore" the resource consumption distributions of each customer type under different actions.

Learning to Optimize under Non-Stationarity

no code implementations6 Oct 2018 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting.

Thompson Sampling Algorithms for Cascading Bandits

no code implementations2 Oct 2018 Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter.

Efficient Exploration Multi-Armed Bandits +2

Assortment Optimization under Unknown MultiNomial Logit Choice Models

no code implementations1 Apr 2017 Wang Chi Cheung, David Simchi-Levi

We first propose an efficient online policy which incurs a regret $\tilde{O}(T^{2/3})$, where $T$ is the number of customers in the sales horizon.

Cannot find the paper you are looking for? You can Submit a new open access paper.