no code implementations • 11 Nov 2024 • Haolin Liu, Zakaria Mhammedi, Chen-Yu Wei, Julian Zimmert
First, we improve the $poly(d, A, H)T^{5/6}$ regret bound of Zhao et al. (2024) to $poly(d, A, H)T^{2/3}$ for the full-information unknown transition setting, where d is the rank of the transitions, A is the number of actions, H is the horizon length, and T is the number of episodes.
no code implementations • 10 May 2024 • Julian Zimmert, Teodor V. Marinov
In this work we propose the first incentive-compatible algorithms that enjoy $O(\sqrt{KT})$ regret bounds.
no code implementations • NeurIPS 2023 • Jon Schneider, Julian Zimmert
In this setting, we resolve an open problem of Balseiro et al. by providing an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of $\widetilde{O}(\sqrt{TK})$, independent of the number of contexts.
no code implementations • 17 Oct 2023 • Haolin Liu, Chen-Yu Wei, Julian Zimmert
The first algorithm, although computationally inefficient, ensures a regret of $\widetilde{\mathcal{O}}\left(\sqrt{K}\right)$, where $K$ is the number of episodes.
no code implementations • 21 Aug 2023 • Saeed Masoudian, Julian Zimmert, Yevgeny Seldin
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback.
no code implementations • 20 Feb 2023 • Christoph Dann, Chen-Yu Wei, Julian Zimmert
Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently.
no code implementations • 18 Feb 2023 • Christoph Dann, Chen-Yu Wei, Julian Zimmert
Then we show that under known transitions, we can further obtain a first-order regret bound in the adversarial regime by leveraging the log-barrier regularizer.
no code implementations • 30 Jan 2023 • Yan Dai, Haipeng Luo, Chen-Yu Wei, Julian Zimmert
This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest.
no code implementations • 17 Oct 2022 • Christoph Dann, Chen-Yu Wei, Julian Zimmert
Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards.
no code implementations • NeurIPS 2021 • Christoph Dann, Mehryar Mohri, Tong Zhang, Julian Zimmert
Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings.
no code implementations • 29 Jun 2022 • Saeed Masoudian, Julian Zimmert, Yevgeny Seldin
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.
no code implementations • 20 Jun 2022 • Teodor V. Marinov, Mehryar Mohri, Julian Zimmert
We revisit the problem of stochastic online learning with feedback graphs, with the goal of devising algorithms that are optimal, up to constants, both asymptotically and in finite time.
no code implementations • 6 Feb 2022 • Julian Zimmert, Naman Agarwal, Satyen Kale
This algorithm, called SCHRODINGER'S BISONS, is the first efficient algorithm with polylogarithmic regret for this more general problem.
no code implementations • NeurIPS 2021 • Teodor V. Marinov, Julian Zimmert
Recent progress in model selection raises the question of the fundamental limits of these techniques.
no code implementations • 7 Oct 2021 • Chen-Yu Wei, Christoph Dann, Julian Zimmert
We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward.
no code implementations • 6 Oct 2021 • Naman Agarwal, Satyen Kale, Julian Zimmert
Previous work (Foster et al., 2018) has highlighted the importance of improper predictors for achieving "fast rates" in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class.
no code implementations • NeurIPS 2020 • Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert
Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge.
no code implementations • NeurIPS 2021 • Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert
Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.
no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari
Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.
1 code implementation • ICML 2020 • Andrey Kolobov, Sébastien Bubeck, Julian Zimmert
Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated.
no code implementations • 14 Oct 2019 • Julian Zimmert, Yevgeny Seldin
The result requires no advance knowledge of the delays and resolves an open problem of Thune et al. (2019).
no code implementations • NeurIPS 2019 • Julian Zimmert, Tor Lattimore
The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings.
no code implementations • 25 Jan 2019 • Julian Zimmert, Haipeng Luo, Chen-Yu Wei
We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$.
no code implementations • 19 Jul 2018 • Julian Zimmert, Yevgeny Seldin
More generally, we define an adversarial regime with a self-bounding constraint, which includes stochastic regime, stochastically constrained adversarial regime (Wei and Luo), and stochastic regime with adversarial corruptions (Lykouris et al.) as special cases, and show that the algorithm achieves logarithmic regret guarantee in this regime and all of its special cases simultaneously with the adversarial regret guarantee.}
no code implementations • NeurIPS 2018 • Julian Zimmert, Yevgeny Seldin
We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions.
1 code implementation • 25 Nov 2016 • Maximilian Alber, Julian Zimmert, Urun Dogan, Marius Kloft
Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way.