Search Results for author: Ruihao Zhu

Found 15 papers, 0 papers with code

Efficient and Interpretable Bandit Algorithms

no code implementations23 Oct 2023 Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton

We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.

User Experience Design Professionals' Perceptions of Generative Artificial Intelligence

no code implementations26 Sep 2023 Jie Li, Hancheng Cao, Laura Lin, Youyang Hou, Ruihao Zhu, Abdallah El Ali

They emphasized the unique human factors of "enjoyment" and "agency", where humans remain the arbiters of "AI alignment".

Phase Transitions in Learning and Earning under Price Protection Guarantee

no code implementations3 Nov 2022 Qing Feng, Ruihao Zhu, Stefanus Jasin

We consider a setting where a firm sells a product over a horizon of $T$ time steps.

Learning to Price Supply Chain Contracts against a Learning Retailer

no code implementations2 Nov 2022 Xuejun Zhao, Ruihao Zhu, William B. Haskell

The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory policies for a fixed time horizon.

Decision Making

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

no code implementations4 Aug 2022 Jingwei Ji, Renyuan Xu, Ruihao Zhu

Then, we rigorously analyze their near-optimal regret upper bounds to show that, by leveraging the linear structure, our algorithms can dramatically reduce the regret when compared to existing methods.

Decision Making

Safe Data Collection for Offline and Online Policy Learning

no code implementations8 Nov 2021 Ruihao Zhu, Branislav Kveton

Specifically, our goal is to develop a logging policy that efficiently explores different actions to elicit information while achieving competitive reward with a baseline production policy.

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

no code implementations ICML 2020 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i. e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets.

reinforcement-learning Reinforcement Learning (RL)

Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

no code implementations7 Jun 2019 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Notably, the interplay between endogeneity and exogeneity presents a unique challenge, absent in existing (stationary and non-stationary) stochastic online learning settings, when we apply the conventional Optimism in Face of Uncertainty principle to design algorithms with provably low dynamic regret for RL in drifting MDPs.

Decision Making reinforcement-learning +1

Hedging the Drift: Learning to Optimize under Non-Stationarity

no code implementations4 Mar 2019 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner.

Decision Making

Meta Dynamic Pricing: Transfer Learning Across Experiments

no code implementations28 Feb 2019 Hamsa Bastani, David Simchi-Levi, Ruihao Zhu

We study the problem of learning shared structure \emph{across} a sequence of dynamic pricing experiments for related products.

Thompson Sampling Transfer Learning

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

no code implementations24 Oct 2018 Ruihao Zhu, Eytan Modiano

We introduce efficient algorithms which achieve nearly optimal regrets for the problem of stochastic online shortest path routing with end-to-end feedback.

Learning to Optimize under Non-Stationarity

no code implementations6 Oct 2018 Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting.

Threshold Bandits, With and Without Censored Feedback

no code implementations NeurIPS 2016 Jacob D. Abernethy, Kareem Amin, Ruihao Zhu

The learner selects one of $K$ actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the event that this sample exceeds the threshold value.

Cannot find the paper you are looking for? You can Submit a new open access paper.