no code implementations • 23 Oct 2023 • Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton
We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.
no code implementations • 26 Sep 2023 • Jie Li, Hancheng Cao, Laura Lin, Youyang Hou, Ruihao Zhu, Abdallah El Ali
They emphasized the unique human factors of "enjoyment" and "agency", where humans remain the arbiters of "AI alignment".
no code implementations • 3 Nov 2022 • Qing Feng, Ruihao Zhu, Stefanus Jasin
We consider a setting where a firm sells a product over a horizon of $T$ time steps.
no code implementations • 2 Nov 2022 • Xuejun Zhao, Ruihao Zhu, William B. Haskell
The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory policies for a fixed time horizon.
no code implementations • 4 Aug 2022 • Jingwei Ji, Renyuan Xu, Ruihao Zhu
Then, we rigorously analyze their near-optimal regret upper bounds to show that, by leveraging the linear structure, our algorithms can dramatically reduce the regret when compared to existing methods.
no code implementations • 8 Nov 2021 • Ruihao Zhu, Branislav Kveton
Specifically, our goal is to develop a logging policy that efficiently explores different actions to elicit information while achieving competitive reward with a baseline production policy.
no code implementations • 7 Oct 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar
We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes.
no code implementations • 28 Sep 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Basar
We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes (MDPs).
no code implementations • ICML 2020 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i. e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets.
no code implementations • 7 Jun 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Notably, the interplay between endogeneity and exogeneity presents a unique challenge, absent in existing (stationary and non-stationary) stochastic online learning settings, when we apply the conventional Optimism in Face of Uncertainty principle to design algorithms with provably low dynamic regret for RL in drifting MDPs.
no code implementations • 4 Mar 2019 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner.
no code implementations • 28 Feb 2019 • Hamsa Bastani, David Simchi-Levi, Ruihao Zhu
We study the problem of learning shared structure \emph{across} a sequence of dynamic pricing experiments for related products.
no code implementations • 24 Oct 2018 • Ruihao Zhu, Eytan Modiano
We introduce efficient algorithms which achieve nearly optimal regrets for the problem of stochastic online shortest path routing with end-to-end feedback.
no code implementations • 6 Oct 2018 • Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting.
no code implementations • NeurIPS 2016 • Jacob D. Abernethy, Kareem Amin, Ruihao Zhu
The learner selects one of $K$ actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the event that this sample exceeds the threshold value.