no code implementations • 6 Jan 2022 • Yueyang Liu, Adithya M. Devraj, Benjamin Van Roy, Kuang Xu
We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function when applied instead to a Bernoulli bandit.
1 code implementation • 23 Sep 2021 • Evan Munro, Stefan Wager, Kuang Xu
When randomized trials are run in a marketplace equilibriated by prices, interference arises.
no code implementations • 18 May 2021 • Neil Walton, Kuang Xu
We review the role of information and learning in the stability and optimization of queueing systems.
no code implementations • 7 Mar 2021 • Ruiyang Song, Stefano Rini, Kuang Xu
Causal bandit is a nascent learning model where an agent sequentially experiments in a causal network of variables, in order to identify the reward-maximizing intervention.
no code implementations • 23 Feb 2021 • Jiaming Xu, Kuang Xu, Dana Yang
Convex optimization with feedback is a framework where a learner relies on iterative queries and feedback to arrive at the minimizer of a convex function.
no code implementations • 18 Feb 2021 • Adithya M. Devraj, Benjamin Van Roy, Kuang Xu
The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation.
no code implementations • NeurIPS 2018 • Kuang Xu
How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary?
no code implementations • 21 Sep 2019 • Jiaming Xu, Kuang Xu, Dana Yang
We study the query complexity of a learner-private sequential learning problem, motivated by the privacy and security concerns due to eavesdropping that arise in practical applications such as pricing and Federated Learning.
no code implementations • 29 Jul 2019 • Kuang Xu, Se-Young Yun
We focus on a family of decision rules where the agent makes a new choice by randomly selecting an action with a probability approximately proportional to the amount of past rewards associated with each action in her memory.
no code implementations • 6 May 2018 • John N. Tsitsiklis, Kuang Xu, Zhi Xu
We formulate a private learning model to study an intrinsic tradeoff between privacy and query complexity in sequential learning.