You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 8 Dec 2021 • Liran Szlak, Ohad Shamir

A commonly used heuristic in RL is experience replay (e. g.~\citet{lin1993reinforcement, mnih2015human}), in which a learner stores and re-uses past trajectories as if they were sampled online.

no code implementations • 8 Dec 2021 • Liran Szlak, Ohad Shamir

Experience replay \citep{lin1993reinforcement, mnih2015human} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms.

no code implementations • 7 Oct 2021 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

no code implementations • NeurIPS 2021 • Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth

We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication.

no code implementations • 6 Oct 2021 • Gal Vardi, Ohad Shamir, Nathan Srebro

Lyu and Li [2019] showed that in homogeneous networks trained with the exponential or the logistic loss, gradient flow converges to a KKT point of the max margin problem in the parameter space.

1 code implementation • NeurIPS 2021 • Itay Safran, Ohad Shamir

Perhaps surprisingly, we prove that when the condition number is taken into account, without-replacement SGD \emph{does not} significantly improve on with-replacement SGD in terms of worst-case bounds, unless the number of epochs (passes over the data) is larger than the condition number.

no code implementations • NeurIPS 2021 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent.

no code implementations • NeurIPS 2021 • Guy Kornowski, Ohad Shamir

For this approach, we prove under a mild assumption an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e. g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods.

no code implementations • 2 Feb 2021 • Blake Woodworth, Brian Bullins, Ohad Shamir, Nathan Srebro

We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates.

no code implementations • 31 Jan 2021 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks.

no code implementations • 30 Jan 2021 • Gal Vardi, Daniel Reichman, Toniann Pitassi, Ohad Shamir

We show a complexity-theoretic barrier to proving such results beyond size $O(d\log^2(d))$, but also show an explicit benign function, that can be approximated with networks of size $O(d)$ and not with networks of size $o(d/\log d)$.

no code implementations • 9 Dec 2020 • Gal Vardi, Ohad Shamir

For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018].

no code implementations • 13 Oct 2020 • Guy Kornowski, Ohad Shamir

In this note, we consider the complexity of optimizing a highly smooth (Lipschitz $k$-th order derivative) and strongly convex function, via calls to a $k$-th order oracle which returns the value and first $k$ derivatives of the function at a given point, and where the dimension is unrestricted.

no code implementations • 30 Jun 2020 • Ohad Shamir

A line of recent works established that when training linear predictors over separable data, using gradient methods and exponentially-tailed losses, the predictors asymptotically converge in direction to the max-margin predictor.

1 code implementation • 1 Jun 2020 • Itay Safran, Gilad Yehudai, Ohad Shamir

We prove that while the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons, it is not even \emph{locally convex} after any amount of over-parameterization.

no code implementations • NeurIPS 2020 • Gal Vardi, Ohad Shamir

To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks.

no code implementations • 27 Feb 2020 • Ohad Shamir

It is well-known that given a bounded, smooth nonconvex function, standard gradient-based methods can find $\epsilon$-stationary points (where the gradient norm is less than $\epsilon$) in $\mathcal{O}(1/\epsilon^2)$ iterations.

no code implementations • ICML 2020 • Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

no code implementations • ICML 2020 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.

no code implementations • 15 Jan 2020 • Gilad Yehudai, Ohad Shamir

We consider the fundamental problem of learning a single neuron $x \mapsto\sigma(w^\top x)$ using standard gradient methods.

no code implementations • ICML 2020 • Yoel Drori, Ohad Shamir

We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions.

no code implementations • 31 Jul 2019 • Itay Safran, Ohad Shamir

In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions.

no code implementations • 15 Apr 2019 • Itay Safran, Ronen Eldan, Ohad Shamir

Existing depth separation results for constant-depth networks essentially show that certain radial functions in $\mathbb{R}^d$, which can be easily approximated with depth $3$ networks, cannot be approximated by depth $2$ networks, even up to constant accuracy, unless their size is exponential in $d$.

no code implementations • NeurIPS 2019 • Gilad Yehudai, Ohad Shamir

Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error).

no code implementations • 13 Feb 2019 • Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.

no code implementations • 9 Feb 2019 • Yuval Dagan, Gil Kur, Ohad Shamir

We show that fundamental learning tasks, such as finding an approximate linear separator or linear regression, require memory at least \emph{quadratic} in the dimension, in a natural streaming setting.

no code implementations • NeurIPS 2018 • Murat A. Erdogdu, Lester Mackey, Ohad Shamir

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.

no code implementations • 23 Sep 2018 • Ohad Shamir

We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots, w_k$), which arise in the context of training depth-$k$ linear neural networks.

no code implementations • 26 Jun 2018 • Yossi Arjevani, Ohad Shamir, Nathan Srebro

We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago.

no code implementations • NeurIPS 2018 • Ohad Shamir

In this paper, we rigorously prove that arbitrarily deep, nonlinear residual units indeed exhibit this behavior, in the sense that the optimization landscape contains no local minima with value above what can be obtained with a linear predictor (namely a 1-layer network).

no code implementations • 4 Mar 2018 • Yuval Dagan, Ohad Shamir

We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines.

1 code implementation • ICML 2018 • Itay Safran, Ohad Shamir

We consider the optimization problem associated with training simple ReLU neural networks of the form $\mathbf{x}\mapsto \sum_{i=1}^{k}\max\{0,\mathbf{w}_i^\top \mathbf{x}\}$ with respect to the squared loss.

no code implementations • 18 Dec 2017 • Noah Golowich, Alexander Rakhlin, Ohad Shamir

We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer.

no code implementations • 2 Jun 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

Exploiting the great expressive power of Deep Neural Network architectures, relies on the ability to train them.

no code implementations • 15 May 2017 • Nicolò Cesa-Bianchi, Ohad Shamir

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e. g. the maximal difference between two losses in a given round).

1 code implementation • ICML 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art.

no code implementations • ICML 2017 • Ohad Shamir, Liran Szlak

In this paper, we consider the applicability of this setting to convex online learning with delayed feedback, in which the feedback on the prediction made in round $t$ arrives with some delay $\tau$.

no code implementations • ICML 2017 • Dan Garber, Ohad Shamir, Nathan Srebro

We study algorithms for estimating the leading principal component of the population covariance matrix that are both communication-efficient and achieve estimation error of the order of the centralized ERM solution that uses all $mn$ samples.

no code implementations • NeurIPS 2016 • Ohad Shamir

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled *with* replacement.

no code implementations • ICML 2017 • Yossi Arjevani, Ohad Shamir

Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations.

no code implementations • ICML 2017 • Itay Safran, Ohad Shamir

We provide several new depth-based separation results for feed-forward neural networks, proving that various types of simple and natural functions can be better approximated using deeper networks than shallower ones, even if the shallower networks are much larger.

no code implementations • 5 Sep 2016 • Ohad Shamir

Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over all data distributions.

no code implementations • NeurIPS 2016 • Yossi Arjevani, Ohad Shamir

Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure.

no code implementations • 11 May 2016 • Yossi Arjevani, Ohad Shamir

We consider a broad class of first-order optimization algorithms which are \emph{oblivious}, in the sense that their step sizes are scheduled regardless of the function under consideration, except for limited side-information such as smoothness or strong convexity parameters.

no code implementations • NeurIPS 2016 • Ohad Shamir

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement.

no code implementations • 12 Dec 2015 • Ronen Eldan, Ohad Shamir

We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension.

no code implementations • 9 Dec 2015 • Jonathan Rosenski, Ohad Shamir, Liran Szlak

We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward.

no code implementations • 13 Nov 2015 • Itay Safran, Ohad Shamir

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications.

no code implementations • 30 Sep 2015 • Ohad Shamir

We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i. i. d.

no code implementations • 31 Jul 2015 • Ohad Shamir

We consider the closely related problems of bandit convex optimization with two-point feedback, and zero-order stochastic convex optimization with two function evaluations per round.

no code implementations • 31 Jul 2015 • Ohad Shamir

We study the convergence properties of the VR-PCA algorithm introduced by \cite{shamir2015stochastic} for fast computation of leading singular vectors.

no code implementations • NeurIPS 2015 • Yossi Arjevani, Ohad Shamir

We study the fundamental limits to communication-efficient distributed methods for convex learning and optimization, under different assumptions on the information available to individual machines, and the types of functions considered.

no code implementations • 23 Mar 2015 • Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir

This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and upper bounds are derived.

no code implementations • 5 Nov 2014 • Nicolò Cesa-Bianchi, Yishay Mansour, Ohad Shamir

In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix.

no code implementations • 23 Oct 2014 • Doron Kukliansky, Ohad Shamir

In this paper we analyze a budgeted learning setting, in which the learner can only choose and observe a small subset of the attributes of each training example.

1 code implementation • NeurIPS 2014 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

It is well-known that neural networks are computationally hard to train.

no code implementations • 30 Sep 2014 • Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.

no code implementations • 9 Sep 2014 • Ohad Shamir

We describe and analyze a simple algorithm for principal component analysis and singular value decomposition, VR-PCA, which uses computationally cheap stochastic iterations, yet converges exponentially fast to the optimal solution.

no code implementations • 11 Aug 2014 • Ohad Shamir

We study the attainable regret for online linear optimization problems with bandit feedback, where unlike the full-information setting, the player can only observe its own loss rather than the full loss vector.

no code implementations • 19 Jun 2014 • Ohad Shamir

In this short note, we provide a sample complexity lower bound for learning linear predictors with respect to the squared loss.

no code implementations • 10 Jun 2014 • Ethan Fetaya, Ohad Shamir, Shimon Ullman

We consider the problem of learning from a similarity matrix (such as spectral clustering and lowd imensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed.

1 code implementation • 30 Dec 2013 • Ohad Shamir, Nathan Srebro, Tong Zhang

We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems.

no code implementations • NeurIPS 2013 • Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir

In particular, we show that with switching costs, the attainable rate with bandit feedback is $T^{2/3}$.

no code implementations • NeurIPS 2014 • Ohad Shamir

Many machine learning approaches are characterized by information constraints on how they interact with the training data.

no code implementations • CVPR 2013 • Baoyuan Liu, Fereshteh Sadeghi, Marshall Tappen, Ohad Shamir, Ce Liu

Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows.

no code implementations • 26 Apr 2013 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training such networks, which we denote as the \emph{Basis Learner}.

no code implementations • NeurIPS 2013 • Nicolo Cesa-Bianchi, Ofer Dekel, Ohad Shamir

In particular, we show that with switching costs, the attainable rate with bandit feedback is $\widetilde{\Theta}(T^{2/3})$.

no code implementations • NeurIPS 2012 • Sasha Rakhlin, Ohad Shamir, Karthik Sridharan

We show a principled way of deriving online learning algorithms from a minimax analysis.

no code implementations • 11 Sep 2012 • Ohad Shamir

The problem of stochastic convex optimization with bandit feedback (in the learning community) or without knowledge of gradients (in the optimization community) has received much attention in recent years, in the form of algorithms and performance upper bounds.

no code implementations • NeurIPS 2011 • Andrew Cotter, Ohad Shamir, Nati Srebro, Karthik Sridharan

Mini-batch algorithms have recently received significant attention as a way to speed-up stochastic convex optimization problems.

no code implementations • NeurIPS 2011 • Sham M. Kakade, Varun Kanade, Ohad Shamir, Adam Kalai

In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient.

no code implementations • NeurIPS 2011 • Shie Mannor, Ohad Shamir

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game.

no code implementations • NeurIPS 2011 • Nicolò Cesa-Bianchi, Ohad Shamir

Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader.

no code implementations • NeurIPS 2011 • Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan R. Salakhutdinov

We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions.

no code implementations • 13 Jun 2011 • Nicolò Cesa-Bianchi, Ohad Shamir

Most traditional online learning algorithms are based on variants of mirror descent or follow-the-leader.

no code implementations • 31 Oct 2009 • Sham M. Kakade, Ohad Shamir, Karthik Sridharan, Ambuj Tewari

The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model.

no code implementations • NeurIPS 2008 • Ohad Shamir, Naftali Tishby

In this paper, we provide a set of general sufficient conditions, which ensure the reliability of clustering stability estimators in the large sample regime.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.