3 code implementations • 30 Jan 2019 • Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.
Ranked #31 on Machine Translation on WMT2014 English-French
1 code implementation • 20 Feb 2020 • Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.
1 code implementation • NeurIPS 2019 • Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.
3 code implementations • ICML 2018 • Vineet Gupta, Tomer Koren, Yoram Singer
Preconditioned gradient methods are among the most general and powerful tools in optimization.
6 code implementations • NeurIPS 2019 • Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren
We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization.
1 code implementation • 26 Feb 2020 • Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang
We investigate several confounding factors in the evaluation of optimization algorithms for deep learning.
no code implementations • NeurIPS 2017 • Tomer Koren, Roi Livni, Yishay Mansour
We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions.
no code implementations • 20 Jun 2017 • Vineet Gupta, Tomer Koren, Yoram Singer
We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning.
no code implementations • 24 Feb 2017 • Alon Cohen, Tamir Hazan, Tomer Koren
We revisit the study of optimal regret rates in bandit combinatorial optimization---a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems.
no code implementations • 24 Feb 2017 • Tomer Koren, Roi Livni, Yishay Mansour
In this setting, we give a new algorithm that establishes a regret of $\widetilde{O}(\sqrt{kT} + T/k)$, where $k$ is the number of actions and $T$ is the time horizon.
no code implementations • 23 May 2016 • Alon Cohen, Tamir Hazan, Tomer Koren
We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner.
no code implementations • 21 Mar 2016 • Elad Hazan, Tomer Koren, Roi Livni, Yishay Mansour
We consider the problem of prediction with expert advice when the losses of the experts have low-dimensional structure: they are restricted to an unknown $d$-dimensional subspace.
no code implementations • 8 Apr 2015 • Elad Hazan, Tomer Koren
We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is $\widetilde{\Theta}(N)$.
no code implementations • 26 Feb 2015 • Noga Alon, Nicolò Cesa-Bianchi, Ofer Dekel, Tomer Koren
We study a general class of online learning problems where the feedback is specified by a graph.
no code implementations • 23 Feb 2015 • Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres
We analyze the minimax regret of the adversarial bandit convex optimization problem.
no code implementations • 29 Jul 2014 • Uriel Feige, Tomer Koren, Moshe Tennenholtz
We consider sequential decision making in a setting where regret is measured with respect to a set of stateful reference policies, and feedback is limited to observing the rewards of the actions performed (the so called "bandit" setting).
no code implementations • 18 May 2014 • Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
This class includes problems where the algorithm's loss is the minimum over the recent adversarial values, the maximum over the recent values, or a linear combination of the recent values.
no code implementations • 15 May 2014 • Elad Hazan, Tomer Koren, Kfir. Y. Levy
We show that in contrast to known asymptotic bounds, as long as the number of prediction/optimization iterations is sub exponential, the logistic loss provides no improvement over a generic non-smooth loss function such as the hinge loss.
no code implementations • 25 Feb 2014 • Aharon Ben-Tal, Elad Hazan, Tomer Koren, Shie Mannor
Robust optimization is a common framework in optimization under uncertainty when the problem parameters are not known, but it is rather known that the parameters belong to some given uncertainty set.
no code implementations • 11 Oct 2013 • Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres
We prove that the player's $T$-round minimax regret in this setting is $\widetilde{\Theta}(T^{2/3})$, thereby closing a fundamental gap in our understanding of learning with bandit feedback.
no code implementations • NeurIPS 2013 • Eshcar Hillel, Zohar Karnin, Tomer Koren, Ronny Lempel, Oren Somekh
That is, distributing learning to $k$ players gives rise to a factor $\sqrt{k}$ parallel speed-up.
no code implementations • ICML 2018 • Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, Kunal Talwar
We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses.
no code implementations • NeurIPS 2017 • Tomer Koren, Roi Livni
We present a new affine-invariant optimization algorithm called Online Lazy Newton.
no code implementations • NeurIPS 2016 • Brian Bullins, Elad Hazan, Tomer Koren
We study regression and classification in a setting where the learning algorithm is allowed to access only a limited number of attributes per example, known as the limited attribute observation model.
no code implementations • NeurIPS 2016 • Michal Feldman, Tomer Koren, Roi Livni, Yishay Mansour, Aviv Zohar
We consider a seller with an unlimited supply of a single good, who is faced with a stream of $T$ buyers.
no code implementations • NeurIPS 2015 • Tomer Koren, Kfir Levy
In this setting, we establish the first evidence that ERM is able to attain fast generalization rates, and show that the expected loss of the ERM solution in $d$ dimensions converges to the optimal expected loss in a rate of $d/n$.
no code implementations • NeurIPS 2015 • Ofer Dekel, Ronen Eldan, Tomer Koren
The best algorithm for the general bandit convex optimization problem guarantees a regret of $\widetilde{O}(T^{5/6})$, while the best known lower bound is $\Omega(T^{1/2})$.
no code implementations • NeurIPS 2014 • Ofer Dekel, Elad Hazan, Tomer Koren
We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action.
no code implementations • NeurIPS 2011 • Elad Hazan, Tomer Koren, Nati Srebro
We present an optimization approach for linear SVMs based on a stochastic primal-dual approach, where the primal step is akin to an importance-weighted SGD, and the dual step is a stochastic update on the importance weights.
no code implementations • 17 Feb 2019 • Alon Cohen, Tomer Koren, Yishay Mansour
We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics.
no code implementations • 22 Feb 2019 • Anupam Gupta, Tomer Koren, Kunal Talwar
We study the stochastic multi-armed bandits problem in the presence of adversarial corruption.
no code implementations • 23 Apr 2019 • Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar
We consider convex SGD updates with a block-cyclic structure, i. e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution.
no code implementations • ICLR 2020 • Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang
A commonplace belief in the machine learning community is that using adaptive gradient methods hurts generalization.
no code implementations • ICML 2020 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown.
no code implementations • NeurIPS 2020 • Idan Amir, Idan Attias, Tomer Koren, Roi Livni, Yishay Mansour
We revisit the fundamental problem of prediction with expert advice, in a setting where the environment is benign and generates losses stochastically, but the feedback observed by the learner is subject to a moderate adversarial corruption.
no code implementations • NeurIPS 2020 • Assaf Dauber, Meir Feder, Tomer Koren, Roi Livni
The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms.
no code implementations • 10 May 2020 • Vitaly Feldman, Tomer Koren, Kunal Talwar
We also give a linear-time algorithm achieving the optimal bound on the excess loss for the strongly convex case, as well as a faster algorithm for the non-smooth case.
no code implementations • NeurIPS 2020 • Asaf Cassel, Tomer Koren
We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback.
no code implementations • 11 Aug 2020 • Shahar Azulay, Lior Raz, Amir Globerson, Tomer Koren, Yehuda Afek
HoldOut SGD first randomly selects a set of workers that use their private data in order to propose gradient updates.
no code implementations • 1 Jan 2021 • Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.
no code implementations • NeurIPS 2020 • Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar, Cyril Zhang
State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes.
no code implementations • 27 Oct 2020 • Aadirupa Saha, Tomer Koren, Yishay Mansour
We introduce the problem of regret minimization in Adversarial Dueling Bandits.
no code implementations • 31 Jan 2021 • Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
no code implementations • 1 Feb 2021 • Idan Amir, Tomer Koren, Roi Livni
We give a new separation result between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD) in the fundamental stochastic convex optimization model.
no code implementations • NeurIPS 2021 • Amit Attia, Tomer Koren
For convex quadratic objectives, Chen et al. (2018) proved that the uniform stability of the method grows quadratically with the number of optimization steps, and conjectured that the same is true for the general convex and smooth case.
no code implementations • 7 Feb 2021 • Uri Sherman, Tomer Koren
We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds.
1 code implementation • 24 Feb 2021 • Noga Bar, Tomer Koren, Raja Giryes
Yet, their performance degrades in the presence of noisy labels at train time.
no code implementations • 25 Feb 2021 • Asaf Cassel, Tomer Koren
We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem.
no code implementations • 2 Mar 2021 • Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar
Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy.
no code implementations • 4 Jun 2021 • Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour
We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
no code implementations • NeurIPS 2021 • Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients.
no code implementations • NeurIPS 2021 • Uri Sherman, Tomer Koren, Yishay Mansour
We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order.
no code implementations • NeurIPS 2021 • Idan Amir, Yair Carmon, Tomer Koren, Roi Livni
We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants.
no code implementations • 20 Jul 2021 • Liad Erez, Tomer Koren
We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph $G$ over the available actions.
no code implementations • 29 Sep 2021 • Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang
In the empirical science of training large neural networks, the learning rate schedule is a notoriously challenging-to-tune hyperparameter, which can depend on all other properties (architecture, optimizer, batch size, dataset, regularization, ...) of the problem.
no code implementations • NeurIPS 2021 • Liad Erez, Tomer Koren
We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph $G$ over the available actions.
no code implementations • 27 Feb 2022 • Matan Schliserman, Tomer Koren
Finally, as direct applications of the general bounds, we return to the setting of linear classification with separable data and establish several novel test loss and test accuracy bounds for gradient descent and stochastic gradient descent for a variety of loss functions with different tail decay rates.
no code implementations • 27 Feb 2022 • Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman
We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data.
no code implementations • 2 Mar 2022 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of controlling an unknown linear dynamical system under a stochastic convex cost and full feedback of both the state and cost function.
no code implementations • 3 Jun 2022 • Asaf Cassel, Alon Cohen, Tomer Koren
We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function.
no code implementations • 7 Jun 2022 • Idan Amir, Guy Azov, Tomer Koren, Roi Livni
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021.
no code implementations • 17 Jul 2022 • Amit Attia, Tomer Koren
We consider the problem of designing uniformly stable first-order optimization algorithms for empirical risk minimization.
no code implementations • 28 Jul 2022 • Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour
Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence.
no code implementations • 27 Sep 2022 • Aadirupa Saha, Tomer Koren, Yishay Mansour
We address the problem of \emph{convex optimization with dueling feedback}, where the goal is to minimize a convex function given a weaker form of \emph{dueling} feedback.
no code implementations • 24 Oct 2022 • Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar
Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.
no code implementations • 30 Jan 2023 • Uri Sherman, Tomer Koren, Yishay Mansour
We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses. Our algorithm obtains an $\widetilde O(K^{6/7})$ regret bound, improving significantly over previous state-of-the-art of $\widetilde O (K^{14/15})$ in this setting.
no code implementations • 17 Feb 2023 • Amit Attia, Tomer Koren
We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization.
no code implementations • 27 Feb 2023 • Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar
We also develop an adaptive algorithm for the small-loss setting with regret $O(L^\star\log d + \varepsilon^{-1} \log^{1. 5}{d})$ where $L^\star$ is the total loss of the best expert.
no code implementations • 28 Aug 2023 • Uri Sherman, Alon Cohen, Tomer Koren, Yishay Mansour
We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes.
no code implementations • 23 Nov 2023 • Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice.
no code implementations • 19 Dec 2023 • Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour
We next study a $m$-multiway comparison (`battling') feedback, where the learner can get to see the argmin feedback of $m$-subset of queried points and show a convergence rate of $\smash{\widetilde O}(\frac{d}{ \min\{\log m, d\}\epsilon })$.
no code implementations • 22 Jan 2024 • Matan Schliserman, Uri Sherman, Tomer Koren
Our bound translates to a lower bound of $\Omega (\sqrt{d})$ on the number of training examples required for standard GD to reach a non-trivial test error, answering an open question raised by Feldman (2016) and Amir, Koren, and Livni (2021b) and showing that a non-trivial dimension dependence is unavoidable.
no code implementations • 5 Feb 2024 • Amit Attia, Tomer Koren
We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters.
no code implementations • 5 Mar 2024 • Amit Attia, Tomer Koren
This short note describes a simple technique for analyzing probabilistic algorithms that rely on a light-tailed (but not necessarily bounded) source of randomization.