no code implementations • 11 Feb 2024 • Itay Safran, Daniel Reichman, Paul Valiant
We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating an $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in $[0, 1]^{d}$, assuming exponentially bounded weights.
no code implementations • 18 Jul 2023 • Itay Safran, Daniel Reichman, Paul Valiant
Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights.
no code implementations • 18 May 2022 • Itay Safran, Gal Vardi, Jason D. Lee
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting.
no code implementations • 4 Dec 2021 • Itay Safran, Jason D. Lee
Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities.
1 code implementation • NeurIPS 2021 • Itay Safran, Ohad Shamir
Perhaps surprisingly, we prove that when the condition number is taken into account, without-replacement SGD \emph{does not} significantly improve on with-replacement SGD in terms of worst-case bounds, unless the number of epochs (passes over the data) is larger than the condition number.
1 code implementation • 1 Jun 2020 • Itay Safran, Gilad Yehudai, Ohad Shamir
We prove that while the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons, it is not even \emph{locally convex} after any amount of over-parameterization.
no code implementations • 31 Jul 2019 • Itay Safran, Ohad Shamir
In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions.
no code implementations • 15 Apr 2019 • Itay Safran, Ronen Eldan, Ohad Shamir
Existing depth separation results for constant-depth networks essentially show that certain radial functions in $\mathbb{R}^d$, which can be easily approximated with depth $3$ networks, cannot be approximated by depth $2$ networks, even up to constant accuracy, unless their size is exponential in $d$.
no code implementations • 30 Jan 2019 • Adi Shamir, Itay Safran, Eyal Ronen, Orr Dunkelman
The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing properties of neural networks".
1 code implementation • ICML 2018 • Itay Safran, Ohad Shamir
We consider the optimization problem associated with training simple ReLU neural networks of the form $\mathbf{x}\mapsto \sum_{i=1}^{k}\max\{0,\mathbf{w}_i^\top \mathbf{x}\}$ with respect to the squared loss.
no code implementations • ICML 2017 • Itay Safran, Ohad Shamir
We provide several new depth-based separation results for feed-forward neural networks, proving that various types of simple and natural functions can be better approximated using deeper networks than shallower ones, even if the shallower networks are much larger.
no code implementations • 13 Nov 2015 • Itay Safran, Ohad Shamir
Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications.