no code implementations • 9 Feb 2024 • Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nathan Srebro, Daniel Soudry
We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN" that agrees with the labels.
no code implementations • 25 Jan 2024 • Yaniv Blumenfeld, Itay Hubara, Daniel Soudry
The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e. g., weights, activations, and gradients).
no code implementations • 23 Jan 2024 • Daniel Goldfarb, Itay Evron, Nir Weinberger, Daniel Soudry, Paul Hand
Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization.
no code implementations • NeurIPS 2023 • Chen Zeno, Greg Ongie, Yaniv Blumenfeld, Nir Weinberger, Daniel Soudry
Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation.
no code implementations • 30 Jun 2023 • Mor Shpigel Nacson, Rotem Mulayoff, Greg Ongie, Tomer Michaeli, Daniel Soudry
Finally, we prove that if a function is sufficiently smooth (in a Sobolev sense) then it can be approximated arbitrarily well using shallow ReLU networks that correspond to stable solutions of gradient descent.
1 code implementation • NeurIPS 2023 • Niv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner, Elad Hoffer, Kfir Yehuda Levy, Daniel Soudry
Thus, these methods are limited by the delays caused by straggling workers.
no code implementations • 6 Jun 2023 • Itay Evron, Edward Moroshko, Gon Buzaglo, Maroun Khriesh, Badea Marjieh, Nathan Srebro, Daniel Soudry
We analyze continual learning on a sequence of separable linear classification tasks with binary labels.
1 code implementation • NeurIPS 2023 • Ev Zisselman, Itai Lavie, Daniel Soudry, Aviv Tamar
Our insight is that learning a policy that effectively $\textit{explores}$ the domain is harder to memorize than a policy that maximizes reward for a specific task, and therefore we expect such learned behavior to generalize well; we indeed demonstrate this empirically on several domains that are difficult for invariance-based approaches.
no code implementations • 22 May 2023 • Itai Kreisler, Mor Shpigel Nacson, Daniel Soudry, Yair Carmon
Using this result, we characterize settings where GD provably converges to the EoS in scalar networks.
1 code implementation • CVPR 2023 • Hagay Michaeli, Tomer Michaeli, Daniel Soudry
Although CNNs are believed to be invariant to translations, recent works have shown this is not the case, due to aliasing effects that stem from downsampling layers.
no code implementations • 10 Feb 2023 • Itay Evron, Ophir Onn, Tamar Weiss Orzech, Hai Azeroual, Daniel Soudry
Error-correcting codes (ECC) are used to reduce multiclass classification tasks to multiple binary classification subproblems.
no code implementations • 19 May 2022 • Itay Evron, Edward Moroshko, Rachel Ward, Nati Srebro, Daniel Soudry
In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas.
1 code implementation • 21 Mar 2022 • Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry
We show that while minimization of the MSE works fine for pruning the activations, it catastrophically fails for the neural gradients.
no code implementations • 19 Dec 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Based on this, we suggest a \textit{logarithmic unbiased quantization} (LUQ) method to quantize all both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training without overhead.
no code implementations • NeurIPS 2021 • Rotem Mulayoff, Tomer Michaeli, Daniel Soudry
First, we extend the existing knowledge on minima stability to non-differentiable minima, which are common in ReLU nets.
no code implementations • 29 Sep 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Based on this, we suggest a logarithmic unbiased quantization (LUQ) method to quantize both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training.
no code implementations • 24 Sep 2021 • Aviv Tamar, Daniel Soudry, Ev Zisselman
In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought.
1 code implementation • NeurIPS 2021 • Niv Giladi, Zvika Ben-Haim, Sella Nevo, Yossi Matias, Daniel Soudry
Background: Floods are the most common natural disaster in the world, affecting the lives of hundreds of millions.
no code implementations • ICLR 2022 • Matan Haroush, Tzviel Frostig, Ruth Heller, Daniel Soudry
Our method achieves comparable or better results than state-of-the-art methods on well-accepted OOD benchmarks, without retraining the network parameters or assuming prior knowledge on the test distribution -- and at a fraction of the computational cost.
no code implementations • 19 Feb 2021 • Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.
1 code implementation • NeurIPS 2021 • Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Seffi Naor, Daniel Soudry
Finally, to solve the problem of switching between different structure constraints, we suggest a method to convert a pre-trained model with unstructured sparsity to an N:M fine-grained block sparsity model with little to no training.
2 code implementations • 1 Jan 2021 • Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry
Although trained on images of a specific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.
no code implementations • pproximateinference AABI Symposium 2021 • Chen Zeno, Itay Golan, Ari Pakman, Daniel Soudry
Recent works have shown that the predictive accuracy of Bayesian deep learning models exhibit substantial improvements when the posterior is raised to a 1/T power with T<1.
1 code implementation • 1 Oct 2020 • Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry
The optimal Bayesian solution for this requires an intractable online Bayes update to the weights posterior.
no code implementations • NeurIPS 2020 • Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry
We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".
1 code implementation • ICML 2020 • Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry
Deep neural networks are typically initialized with random weights, with variances chosen to facilitate signal propagation and stable gradients.
no code implementations • ICLR 2021 • Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry
While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations.
1 code implementation • 14 Jun 2020 • Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry
Instead, these methods only use the calibration set to set the activations' dynamic ranges.
1 code implementation • 20 Feb 2020 • Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro
We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
no code implementations • 29 Dec 2019 • Tzofnat Greenberg Toledo, Ben Perach, Itay Hubara, Daniel Soudry, Shahar Kvatinsky
A recent example is the GXNOR framework for stochastic training of ternary (TNN) and binary (BNN) neural networks.
1 code implementation • 11 Dec 2019 • Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry
Standard practice in training neural networks involves initializing the weights in an independent fashion.
no code implementations • CVPR 2020 • Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry
Then, we demonstrate how these samples can be used to calibrate and fine-tune quantized models without using any real data in the process.
1 code implementation • NeurIPS 2019 • Ron Banner, Yury Nahshan, Daniel Soudry
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.
no code implementations • ICLR 2020 • Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro
In this paper, we characterize the norm required to realize a function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.
1 code implementation • ICLR 2020 • Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry
However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm.
2 code implementations • 12 Aug 2019 • Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry
Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.
1 code implementation • 13 Jun 2019 • Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro
A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.
1 code implementation • NeurIPS 2019 • Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry
Reducing the precision of weights and activation functions in neural network training, with minimal impact on performance, is essential for the deployment of these models in resource-constrained environments.
no code implementations • 17 May 2019 • Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry
With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.
1 code implementation • ICLR 2019 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
We analyze the trade-off between quantization noise and clipping distortion in low precision networks.
no code implementations • 13 Feb 2019 • Pedro Savarese, Itay Evron, Daniel Soudry, Nathan Srebro
We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function.
1 code implementation • 27 Jan 2019 • Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry
We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets.
2 code implementations • 2 Oct 2018 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.
no code implementations • 5 Jun 2018 • Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry
We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.
no code implementations • NeurIPS 2018 • Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro
We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain.
3 code implementations • NeurIPS 2018 • Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry
Armed with this knowledge, we quantize the model parameters, activations and layer gradients to 8-bit, leaving at a higher precision only the final step in the computation of the weight gradients.
no code implementations • 13 May 2018 • Zhihui Zhu, Daniel Soudry, Yonina C. Eldar, Michael B. Wakin
We examine the squared error loss landscape of shallow linear neural networks.
2 code implementations • 27 Mar 2018 • Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry
However, research for scenarios in which task boundaries are unknown during training has been lacking.
no code implementations • 5 Mar 2018 • Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry
We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.
4 code implementations • NeurIPS 2018 • Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry
Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications.
no code implementations • ICML 2018 • Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro
We study the implicit bias of generic optimization methods, such as mirror descent, natural gradient descent, and steepest descent with respect to different potentials and norms, when optimizing underdetermined linear regression or separable linear classification problems.
no code implementations • 14 Feb 2018 • Elad Hoffer, Shai Fine, Daniel Soudry
Deep convolutional network has been the state-of-the-art approach for a wide variety of tasks over the last few years.
3 code implementations • ICLR 2018 • Elad Hoffer, Itay Hubara, Daniel Soudry
Neural networks are commonly used as models for classification for a wide variety of tasks.
2 code implementations • ICLR 2018 • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.
1 code implementation • NeurIPS 2017 • Elad Hoffer, Itay Hubara, Daniel Soudry
Following this hypothesis we conducted experiments to show empirically that the "generalization gap" stems from the relatively small number of updates rather than the batch size, and can be completely eliminated by adapting the training regime used.
1 code implementation • ICLR 2018 • Daniel Soudry, Elad Hoffer
We prove that, with high probability in the limit of $N\rightarrow\infty$ datapoints, the volume of differentiable regions of the empiric loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given standard normal input of dimension $d_{0}=\tilde{\Omega}\left(\sqrt{N}\right)$, and a more realistic number of $d_{1}=\tilde{\Omega}\left(N/d_{0}\right)$ hidden units.
5 code implementations • 22 Sep 2016 • Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.
no code implementations • 26 May 2016 • Daniel Soudry, Yair Carmon
We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima.
26 code implementations • 9 Feb 2016 • Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.
2 code implementations • NeurIPS 2016 • Itay Hubara, Daniel Soudry, Ran El Yaniv
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time and when computing the parameters' gradient at train-time.
1 code implementation • 12 Mar 2015 • Zhiyong Cheng, Daniel Soudry, Zexi Mao, Zhenzhong Lan
In this paper, we investigate the capability of BMNNs using the EBP algorithm on multiclass image classification tasks.
2 code implementations • NeurIPS 2014 • Daniel Soudry, Itay Hubara, Ron Meir
Using online EP and the central limit theorem we find an analytical approximation to the Bayes update of this posterior, as well as the resulting Bayes estimates of the weights and outputs.
11 code implementations • 9 Sep 2014 • Eftychios A. Pnevmatikakis, Yuanjun Gao, Daniel Soudry, David Pfau, Clay Lacefield, Kira Poskanzer, Randy Bruno, Rafael Yuste, Liam Paninski
We present a structured matrix factorization approach to analyzing calcium imaging recordings of large neuronal ensembles.
Neurons and Cognition Quantitative Methods Applications
no code implementations • 7 Oct 2013 • Daniel Soudry, Ron Meir
Significant success has been reported recently using deep neural networks for classification.
no code implementations • NeurIPS 2012 • Dmitri B. Chklovskii, Daniel Soudry
If noise-shaping were used in neurons, it would introduce correlations in spike timing to reduce low-frequency (up to Nyquist) transmission error at the cost of high-frequency one (from Nyquist to sampling rate).