1 code implementation • 1 Oct 2023 • Yunbei Xu, Assaf Zeevi
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles.
no code implementations • NeurIPS 2020 • Yunbei Xu, Assaf Zeevi
We study problem-dependent rates, i. e., generalization errors that scale tightly with the variance or the effective loss at the "best hypothesis."
no code implementations • 12 Nov 2020 • Yunbei Xu, Assaf Zeevi
We introduce a principled framework dubbed "uniform localized convergence," and characterize sharp problem-dependent rates for central statistical learning problems.
no code implementations • 15 Jul 2020 • Yunbei Xu, Assaf Zeevi
The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning.
1 code implementation • 21 Nov 2018 • Yanli Liu, Yunbei Xu, Wotao Yin
They reduce a difficult problem to simple subproblems, so they are easy to implement and have many applications.
Optimization and Control