Search Results for author: Wang Chi Cheung

Found 15 papers, 2 papers with code

Best Arm Identification with Resource Constraints

no code implementations • 29 Feb 2024 • Zitian Li, Wang Chi Cheung

Motivated by the cost heterogeneity in experimentation across different alternatives, we study the Best Arm Identification with Resource Constraints (BAIwRC) problem.

Paper
Add Code

Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands

no code implementations • 8 Feb 2023 • Lixing Lyu, Wang Chi Cheung

Finally, we adapt our model to a network revenue management problem, and numerically demonstrate that our algorithm can still performs competitively compared to existing baselines.

Management

Paper
Add Code

Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits

no code implementations • 16 Oct 2021 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon.

Multi-Armed Bandits

Paper
Add Code

Reinforcement Learning with Ex-Post Max-Min Fairness

no code implementations • 29 Sep 2021 • Wang Chi Cheung, Zi Yi Ewe

We consider reinforcement learning with vectorial rewards, where the agent receives a vector of $K\geq 2$ different types of rewards at each time step.

Fairness reinforcement-learning +1

Paper
Add Code

Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions

1 code implementation • 15 Oct 2020 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

When the amount of corruptions per step (CPS) is below a threshold, PSS($u$) identifies the best arm or item with probability tending to $1$ as $T\rightarrow \infty$.

Paper
Code

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

no code implementations • ICML 2020 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i. e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

no code implementations • ICML 2020 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

Finally, extensive numerical simulations corroborate the efficacy of CascadeBAI as well as the tightness of our upper bound on its time complexity.

Paper
Add Code

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

1 code implementation • NeurIPS 2019 • Wang Chi Cheung

We consider an agent who is involved in an online Markov decision process, and receives a vector of outcomes every round.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

no code implementations • 7 Jun 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Notably, the interplay between endogeneity and exogeneity presents a unique challenge, absent in existing (stationary and non-stationary) stochastic online learning settings, when we apply the conventional Optimism in Face of Uncertainty principle to design algorithms with provably low dynamic regret for RL in drifting MDPs.

Decision Making reinforcement-learning +1

Paper
Add Code

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

no code implementations • 15 May 2019 • Wang Chi Cheung

In our general setting where a stationary policy could have multiple recurrent classes, the agent faces a subtle yet consequential trade-off in alternating among different actions for balancing the vectorial outcomes.

Paper
Add Code

Hedging the Drift: Learning to Optimize under Non-Stationarity

no code implementations • 4 Mar 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner.

Decision Making

Paper
Add Code

Inventory Balancing with Online Learning

no code implementations • 11 Oct 2018 • Wang Chi Cheung, Will Ma, David Simchi-Levi, Xinshang Wang

We overcome both the challenges of model uncertainty and customer heterogeneity by judiciously synthesizing two algorithmic frameworks from the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive, and online learning, which shows how to "explore" the resource consumption distributions of each customer type under different actions.

Paper
Add Code

Learning to Optimize under Non-Stationarity

no code implementations • 6 Oct 2018 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting.

Paper
Add Code

Thompson Sampling Algorithms for Cascading Bandits

no code implementations • 2 Oct 2018 • Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter.

Efficient Exploration Multi-Armed Bandits +2

Paper
Add Code

Assortment Optimization under Unknown MultiNomial Logit Choice Models

no code implementations • 1 Apr 2017 • Wang Chi Cheung, David Simchi-Levi

We first propose an efficient online policy which incurs a regret $\tilde{O}(T^{2/3})$, where $T$ is the number of customers in the sales horizon.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.