Search Results for author: Tomer Koren

Found 74 papers, 7 papers with code

Memory-Efficient Adaptive Optimization

3 code implementations30 Jan 2019 Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.

Language Modelling Machine Translation +1

Scalable Second Order Optimization for Deep Learning

1 code implementation20 Feb 2020 Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.

Image Classification Language Modelling +2

Memory Efficient Adaptive Optimization

1 code implementation NeurIPS 2019 Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.

Language Modelling Machine Translation +1

Shampoo: Preconditioned Stochastic Tensor Optimization

3 code implementations ICML 2018 Vineet Gupta, Tomer Koren, Yoram Singer

Preconditioned gradient methods are among the most general and powerful tools in optimization.

Stochastic Optimization

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

6 code implementations NeurIPS 2019 Ehsan Amid, Manfred K. Warmuth, Rohan Anil, Tomer Koren

We introduce a temperature into the exponential function and replace the softmax output layer of neural nets by a high temperature generalization.

Disentangling Adaptive Gradient Methods from Learning Rates

1 code implementation26 Feb 2020 Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang

We investigate several confounding factors in the evaluation of optimization algorithms for deep learning.

Multi-Armed Bandits with Metric Movement Costs

no code implementations NeurIPS 2017 Tomer Koren, Roi Livni, Yishay Mansour

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions.

Multi-Armed Bandits

A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

no code implementations20 Jun 2017 Vineet Gupta, Tomer Koren, Yoram Singer

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning.

Stochastic Optimization

Tight Bounds for Bandit Combinatorial Optimization

no code implementations24 Feb 2017 Alon Cohen, Tamir Hazan, Tomer Koren

We revisit the study of optimal regret rates in bandit combinatorial optimization---a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems.

Combinatorial Optimization Decision Making +1

Bandits with Movement Costs and Adaptive Pricing

no code implementations24 Feb 2017 Tomer Koren, Roi Livni, Yishay Mansour

In this setting, we give a new algorithm that establishes a regret of $\widetilde{O}(\sqrt{kT} + T/k)$, where $k$ is the number of actions and $T$ is the time horizon.

Online Learning with Feedback Graphs Without the Graphs

no code implementations23 May 2016 Alon Cohen, Tamir Hazan, Tomer Koren

We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner.

Online Learning with Low Rank Experts

no code implementations21 Mar 2016 Elad Hazan, Tomer Koren, Roi Livni, Yishay Mansour

We consider the problem of prediction with expert advice when the losses of the experts have low-dimensional structure: they are restricted to an unknown $d$-dimensional subspace.

The Computational Power of Optimization in Online Learning

no code implementations8 Apr 2015 Elad Hazan, Tomer Koren

We also give a lower bound showing that this running time cannot be improved (up to log factors) in the oracle model, thereby exhibiting a quadratic speedup as compared to the standard, oracle-free setting where the required time for vanishing regret is $\widetilde{\Theta}(N)$.

Online Learning with Feedback Graphs: Beyond Bandits

no code implementations26 Feb 2015 Noga Alon, Nicolò Cesa-Bianchi, Ofer Dekel, Tomer Koren

We study a general class of online learning problems where the feedback is specified by a graph.

Bandit Convex Optimization: sqrt{T} Regret in One Dimension

no code implementations23 Feb 2015 Sébastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres

We analyze the minimax regret of the adversarial bandit convex optimization problem.

Thompson Sampling

Chasing Ghosts: Competing with Stateful Policies

no code implementations29 Jul 2014 Uriel Feige, Tomer Koren, Moshe Tennenholtz

We consider sequential decision making in a setting where regret is measured with respect to a set of stateful reference policies, and feedback is limited to observing the rewards of the actions performed (the so called "bandit" setting).

Attribute Decision Making +1

Online Learning with Composite Loss Functions

no code implementations18 May 2014 Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres

This class includes problems where the algorithm's loss is the minimum over the recent adversarial values, the maximum over the recent values, or a linear combination of the recent values.

Logistic Regression: Tight Bounds for Stochastic and Online Optimization

no code implementations15 May 2014 Elad Hazan, Tomer Koren, Kfir. Y. Levy

We show that in contrast to known asymptotic bounds, as long as the number of prediction/optimization iterations is sub exponential, the logistic loss provides no improvement over a generic non-smooth loss function such as the hinge loss.

regression

Oracle-Based Robust Optimization via Online Learning

no code implementations25 Feb 2014 Aharon Ben-Tal, Elad Hazan, Tomer Koren, Shie Mannor

Robust optimization is a common framework in optimization under uncertainty when the problem parameters are not known, but it is rather known that the parameters belong to some given uncertainty set.

Bandits with Switching Costs: T^{2/3} Regret

no code implementations11 Oct 2013 Ofer Dekel, Jian Ding, Tomer Koren, Yuval Peres

We prove that the player's $T$-round minimax regret in this setting is $\widetilde{\Theta}(T^{2/3})$, thereby closing a fundamental gap in our understanding of learning with bandit feedback.

Online Linear Quadratic Control

no code implementations ICML 2018 Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, Kunal Talwar

We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses.

Affine-Invariant Online Optimization and the Low-rank Experts Problem

no code implementations NeurIPS 2017 Tomer Koren, Roi Livni

We present a new affine-invariant optimization algorithm called Online Lazy Newton.

The Limits of Learning with Missing Data

no code implementations NeurIPS 2016 Brian Bullins, Elad Hazan, Tomer Koren

We study regression and classification in a setting where the learning algorithm is allowed to access only a limited number of attributes per example, known as the limited attribute observation model.

Attribute General Classification +1

Online Pricing with Strategic and Patient Buyers

no code implementations NeurIPS 2016 Michal Feldman, Tomer Koren, Roi Livni, Yishay Mansour, Aviv Zohar

We consider a seller with an unlimited supply of a single good, who is faced with a stream of $T$ buyers.

Fast Rates for Exp-concave Empirical Risk Minimization

no code implementations NeurIPS 2015 Tomer Koren, Kfir Levy

In this setting, we establish the first evidence that ERM is able to attain fast generalization rates, and show that the expected loss of the ERM solution in $d$ dimensions converges to the optimal expected loss in a rate of $d/n$.

regression Stochastic Optimization

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

no code implementations NeurIPS 2015 Ofer Dekel, Ronen Eldan, Tomer Koren

The best algorithm for the general bandit convex optimization problem guarantees a regret of $\widetilde{O}(T^{5/6})$, while the best known lower bound is $\Omega(T^{1/2})$.

The Blinded Bandit: Learning with Adaptive Feedback

no code implementations NeurIPS 2014 Ofer Dekel, Elad Hazan, Tomer Koren

We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action.

Beating SGD: Learning SVMs in Sublinear Time

no code implementations NeurIPS 2011 Elad Hazan, Tomer Koren, Nati Srebro

We present an optimization approach for linear SVMs based on a stochastic primal-dual approach, where the primal step is akin to an importance-weighted SGD, and the dual step is a stochastic update on the importance weights.

Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret

no code implementations17 Feb 2019 Alon Cohen, Tomer Koren, Yishay Mansour

We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics.

Open-Ended Question Answering

Better Algorithms for Stochastic Bandits with Adversarial Corruptions

no code implementations22 Feb 2019 Anupam Gupta, Tomer Koren, Kunal Talwar

We study the stochastic multi-armed bandits problem in the presence of adversarial corruption.

Multi-Armed Bandits

Semi-Cyclic Stochastic Gradient Descent

no code implementations23 Apr 2019 Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar

We consider convex SGD updates with a block-cyclic structure, i. e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution.

Federated Learning

Revisiting the Generalization of Adaptive Gradient Methods

no code implementations ICLR 2020 Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang

A commonplace belief in the machine learning community is that using adaptive gradient methods hurts generalization.

BIG-bench Machine Learning

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

no code implementations ICML 2020 Asaf Cassel, Alon Cohen, Tomer Koren

We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown.

Prediction with Corrupted Expert Advice

no code implementations NeurIPS 2020 Idan Amir, Idan Attias, Tomer Koren, Roi Livni, Yishay Mansour

We revisit the fundamental problem of prediction with expert advice, in a setting where the environment is benign and generates losses stochastically, but the feedback observed by the learner is subject to a moderate adversarial corruption.

Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study

no code implementations NeurIPS 2020 Assaf Dauber, Meir Feder, Tomer Koren, Roi Livni

The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms.

Private Stochastic Convex Optimization: Optimal Rates in Linear Time

no code implementations10 May 2020 Vitaly Feldman, Tomer Koren, Kunal Talwar

We also give a linear-time algorithm achieving the optimal bound on the excess loss for the strongly convex case, as well as a faster algorithm for the non-smooth case.

Bandit Linear Control

no code implementations NeurIPS 2020 Asaf Cassel, Tomer Koren

We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback.

Holdout SGD: Byzantine Tolerant Federated Learning

no code implementations11 Aug 2020 Shahar Azulay, Lior Raz, Amir Globerson, Tomer Koren, Yehuda Afek

HoldOut SGD first randomly selects a set of workers that use their private data in order to propose gradient updates.

Federated Learning

Towards Practical Second Order Optimization for Deep Learning

no code implementations1 Jan 2021 Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.

Click-Through Rate Prediction Image Classification +4

Stochastic Optimization with Laggard Data Pipelines

no code implementations NeurIPS 2020 Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar, Cyril Zhang

State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes.

Stochastic Optimization

Adversarial Dueling Bandits

no code implementations27 Oct 2020 Aadirupa Saha, Tomer Koren, Yishay Mansour

We introduce the problem of regret minimization in Adversarial Dueling Bandits.

Online Markov Decision Processes with Aggregate Bandit Feedback

no code implementations31 Jan 2021 Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.

SGD Generalizes Better Than GD (And Regularization Doesn't Help)

no code implementations1 Feb 2021 Idan Amir, Tomer Koren, Roi Livni

We give a new separation result between the generalization performance of stochastic gradient descent (SGD) and of full-batch gradient descent (GD) in the fundamental stochastic convex optimization model.

Algorithmic Instabilities of Accelerated Gradient Descent

no code implementations NeurIPS 2021 Amit Attia, Tomer Koren

For convex quadratic objectives, Chen et al. (2018) proved that the uniform stability of the method grows quadratically with the number of optimization steps, and conjectured that the same is true for the general convex and smooth case.

Lazy OCO: Online Convex Optimization on a Switching Budget

no code implementations7 Feb 2021 Uri Sherman, Tomer Koren

We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds.

Multiplicative Reweighting for Robust Neural Network Optimization

1 code implementation24 Feb 2021 Noga Bar, Tomer Koren, Raja Giryes

Yet, their performance degrades in the presence of noisy labels at train time.

Adversarial Robustness

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt{T}$ Regret

no code implementations25 Feb 2021 Asaf Cassel, Tomer Koren

We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem.

Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

no code implementations2 Mar 2021 Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy.

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

no code implementations4 Jun 2021 Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.

Multi-Armed Bandits

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

no code implementations NeurIPS 2021 Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients.

Distributed Optimization

Optimal Rates for Random Order Online Optimization

no code implementations NeurIPS 2021 Uri Sherman, Tomer Koren, Yishay Mansour

We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order.

Never Go Full Batch (in Stochastic Convex Optimization)

no code implementations NeurIPS 2021 Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants.

Best-of-All-Worlds Bounds for Online Learning with Feedback Graphs

no code implementations20 Jul 2021 Liad Erez, Tomer Koren

We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph $G$ over the available actions.

Learning Rate Grafting: Transferability of Optimizer Tuning

no code implementations29 Sep 2021 Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang

In the empirical science of training large neural networks, the learning rate schedule is a notoriously challenging-to-tune hyperparameter, which can depend on all other properties (architecture, optimizer, batch size, dataset, regularization, ...) of the problem.

Towards Best-of-All-Worlds Online Learning with Feedback Graphs

no code implementations NeurIPS 2021 Liad Erez, Tomer Koren

We study the online learning with feedback graphs framework introduced by Mannor and Shamir (2011), in which the feedback received by the online learner is specified by a graph $G$ over the available actions.

Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond

no code implementations27 Feb 2022 Matan Schliserman, Tomer Koren

Finally, as direct applications of the general bounds, we return to the setting of linear classification with separable data and establish several novel test loss and test accuracy bounds for gradient descent and stochastic gradient descent for a variety of loss functions with different tail decay rates.

Generalization Bounds

Benign Underfitting of Stochastic Gradient Descent

no code implementations27 Feb 2022 Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data.

Efficient Online Linear Control with Stochastic Convex Costs and Unknown Dynamics

no code implementations2 Mar 2022 Asaf Cassel, Alon Cohen, Tomer Koren

We consider the problem of controlling an unknown linear dynamical system under a stochastic convex cost and full feedback of both the state and cost function.

Rate-Optimal Online Convex Optimization in Adaptive Linear Control

no code implementations3 Jun 2022 Asaf Cassel, Alon Cohen, Tomer Koren

We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function.

Better Best of Both Worlds Bounds for Bandits with Switching Costs

no code implementations7 Jun 2022 Idan Amir, Guy Azov, Tomer Koren, Roi Livni

We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021.

Uniform Stability for First-Order Empirical Risk Minimization

no code implementations17 Jul 2022 Amit Attia, Tomer Koren

We consider the problem of designing uniformly stable first-order optimization algorithms for empirical risk minimization.

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

no code implementations28 Jul 2022 Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour

Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence.

Dueling Convex Optimization with General Preferences

no code implementations27 Sep 2022 Aadirupa Saha, Tomer Koren, Yishay Mansour

We address the problem of \emph{convex optimization with dueling feedback}, where the goal is to minimize a convex function given a weaker form of \emph{dueling} feedback.

Private Online Prediction from Experts: Separations and Faster Rates

no code implementations24 Oct 2022 Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

no code implementations30 Jan 2023 Uri Sherman, Tomer Koren, Yishay Mansour

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses. Our algorithm obtains an $\widetilde O(K^{6/7})$ regret bound, improving significantly over previous state-of-the-art of $\widetilde O (K^{14/15})$ in this setting.

reinforcement-learning Reinforcement Learning (RL)

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

no code implementations17 Feb 2023 Amit Attia, Tomer Koren

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization.

Stochastic Optimization

Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime

no code implementations27 Feb 2023 Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

We also develop an adaptive algorithm for the small-loss setting with regret $O(L^\star\log d + \varepsilon^{-1} \log^{1. 5}{d})$ where $L^\star$ is the total loss of the best expert.

Rate-Optimal Policy Optimization for Linear Markov Decision Processes

no code implementations28 Aug 2023 Uri Sherman, Alon Cohen, Tomer Koren, Yishay Mansour

We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes.

Locally Optimal Descent for Dynamic Stepsize Scheduling

no code implementations23 Nov 2023 Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain

We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice.

Scheduling Stochastic Optimization

Faster Convergence with Multiway Preferences

no code implementations19 Dec 2023 Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour

We next study a $m$-multiway comparison (`battling') feedback, where the learner can get to see the argmin feedback of $m$-subset of queried points and show a convergence rate of $\smash{\widetilde O}(\frac{d}{ \min\{\log m, d\}\epsilon })$.

The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

no code implementations22 Jan 2024 Matan Schliserman, Uri Sherman, Tomer Koren

Our bound translates to a lower bound of $\Omega (\sqrt{d})$ on the number of training examples required for standard GD to reach a non-trivial test error, answering an open question raised by Feldman (2016) and Amir, Koren, and Livni (2021b) and showing that a non-trivial dimension dependence is unavoidable.

How Free is Parameter-Free Stochastic Optimization?

no code implementations5 Feb 2024 Amit Attia, Tomer Koren

We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters.

Stochastic Optimization

A Note on High-Probability Analysis of Algorithms with Exponential, Sub-Gaussian, and General Light Tails

no code implementations5 Mar 2024 Amit Attia, Tomer Koren

This short note describes a simple technique for analyzing probabilistic algorithms that rely on a light-tailed (but not necessarily bounded) source of randomization.

Stochastic Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.