no code implementations • 15 Jan 2024 • Amit Daniely, Mariano Schain, Gilad Yehudai
Optimizing Neural networks is a difficult task which is still not well understood.
no code implementations • 23 Nov 2023 • Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice.
no code implementations • NeurIPS 2023 • Guy Kornowski, Gilad Yehudai, Ohad Shamir
Thus, we show that the input dimension has a crucial role on the type of overfitting in this setting, which we also validate empirically for intermediate dimensions.
no code implementations • 5 May 2023 • Gon Buzaglo, Niv Haim, Gilad Yehudai, Gal Vardi, Michal Irani
Reconstructing samples from the training set of trained neural networks is a major privacy concern.
1 code implementation • 15 Jun 2022 • Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani
We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.
no code implementations • 9 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir
Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples.
no code implementations • 8 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir
We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor.
no code implementations • ICLR 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir
We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
no code implementations • NeurIPS 2021 • Gal Vardi, Gilad Yehudai, Ohad Shamir
We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent.
no code implementations • 31 Jan 2021 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir
On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks.
no code implementations • 17 Oct 2020 • Gilad Yehudai, Ethan Fetaya, Eli Meirom, Gal Chechik, Haggai Maron
In this paper, we identify an important type of data where generalization from small to large graphs is challenging: graph distributions for which the local structure depends on the graph size.
no code implementations • 28 Sep 2020 • Gilad Yehudai, Ethan Fetaya, Eli Meirom, Gal Chechik, Haggai Maron
We further demonstrate on several tasks, that training GNNs on small graphs results in solutions which do not generalize to larger graphs.
1 code implementation • 1 Jun 2020 • Itay Safran, Gilad Yehudai, Ohad Shamir
We prove that while the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons, it is not even \emph{locally convex} after any amount of over-parameterization.
no code implementations • ICML 2020 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir
The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.
no code implementations • 15 Jan 2020 • Gilad Yehudai, Ohad Shamir
We consider the fundamental problem of learning a single neuron $x \mapsto\sigma(w^\top x)$ using standard gradient methods.
no code implementations • NeurIPS 2019 • Gilad Yehudai, Ohad Shamir
Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error).