Search Results for author: Yevgeny Seldin

Found 22 papers, 4 papers with code

An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

no code implementations21 Aug 2023 Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

Another major contribution is demonstrating that the complexity of best-of-both-worlds bandits with delayed feedback is characterized by the cumulative count of outstanding observations after skipping of observations with excessively large delays, rather than the delays or the maximal delay.

Delayed Bandits: When Do Intermediate Observations Help?

no code implementations30 May 2023 Emmanuel Esposito, Saeed Masoudian, Hao Qiu, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin

However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of $\sqrt{\big(K+\min\{|\mathcal{S}|, d\}\big)T}$ (within log factors), implying that if the number $|\mathcal{S}|$ of states is smaller than the delay, then intermediate observations help.

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

no code implementations29 Jun 2022 Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.

A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

no code implementations1 Jun 2022 Chloé Rouyer, Dirk van der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin

The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3. G algorithm for feedback graphs with a novel exploration scheme.

Decision Making

Split-kl and PAC-Bayes-split-kl Inequalities for Ternary Random Variables

1 code implementation1 Jun 2022 Yi-Shan Wu, Yevgeny Seldin

We present a new concentration of measure inequality for sums of independent bounded random variables, which we name a split-kl inequality.

Open-Ended Question Answering

An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

no code implementations19 Feb 2021 Chloé Rouyer, Yevgeny Seldin, Nicolò Cesa-Bianchi

In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of $O\left(\big((\lambda K)^{2/3} T^{1/3} + \ln T\big)\sum_{i \neq i^*} \Delta_i^{-1}\right)$, where $\Delta_i$ are the suboptimality gaps and $i^*$ is a unique optimal arm.

An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays

no code implementations14 Oct 2019 Julian Zimmert, Yevgeny Seldin

The result requires no advance knowledge of the delays and resolves an open problem of Thune et al. (2019).

Multi-Armed Bandits

Nonstochastic Multiarmed Bandits with Unrestricted Delays

no code implementations NeurIPS 2019 Tobias Sommer Thune, Nicolò Cesa-Bianchi, Yevgeny Seldin

We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays.

On PAC-Bayesian Bounds for Random Forests

1 code implementation23 Oct 2018 Stephan Sloth Lorenzen, Christian Igel, Yevgeny Seldin

This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small.

Generalization Bounds

Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits

no code implementations19 Jul 2018 Julian Zimmert, Yevgeny Seldin

More generally, we define an adversarial regime with a self-bounding constraint, which includes stochastic regime, stochastically constrained adversarial regime (Wei and Luo), and stochastic regime with adversarial corruptions (Lykouris et al.) as special cases, and show that the algorithm achieves logarithmic regret guarantee in this regime and all of its special cases simultaneously with the adversarial regret guarantee.}

Multi-Armed Bandits Thompson Sampling

Factored Bandits

no code implementations NeurIPS 2018 Julian Zimmert, Yevgeny Seldin

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions.

Adaptation to Easy Data in Prediction with Limited Advice

no code implementations NeurIPS 2018 Tobias Sommer Thune, Yevgeny Seldin

In addition, we show that in the stochastic setting SODA achieves an $O\left(\sum_{a:\Delta_a>0} \frac{K^3 \varepsilon^2}{\Delta_a}\right)$ pseudo-regret bound that holds simultaneously with the adversarial regret guarantee.

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

no code implementations22 Aug 2016 Brian Brost, Yevgeny Seldin, Ingemar J. Cox, Christina Lioma

Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for online learning under limited feedback from pairwise comparisons.

Online Ranker Evaluation

A Strongly Quasiconvex PAC-Bayesian Bound

no code implementations19 Aug 2016 Niklas Thiemann, Christian Igel, Olivier Wintenberger, Yevgeny Seldin

We propose a new PAC-Bayesian bound and a way of constructing a hypothesis space, so that the bound is convex in the posterior distribution and also convex in a trade-off parameter between empirical performance of the posterior distribution and its complexity.

PAC-Bayes-Empirical-Bernstein Inequality

no code implementations NeurIPS 2013 Ilya O. Tolstikhin, Yevgeny Seldin

The inequality is based on combination of PAC-Bayesian bounding technique with Empirical Bernstein bound.

regression

Advice-Efficient Prediction with Expert Advice

no code implementations12 Apr 2013 Yevgeny Seldin, Peter Bartlett, Koby Crammer

Advice-efficient prediction with expert advice (in analogy to label-efficient prediction) is a variant of prediction with expert advice game, where on each round of the game we are allowed to ask for advice of a limited number $M$ out of $N$ experts.

PAC-Bayesian Analysis of Contextual Bandits

no code implementations NeurIPS 2011 Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, François Laviolette

The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.