Search Results for author: Daniel Soudry

Found 65 papers, 33 papers with code

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

no code implementations9 Feb 2024 Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nathan Srebro, Daniel Soudry

We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN" that agrees with the labels.

Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators

no code implementations25 Jan 2024 Yaniv Blumenfeld, Itay Hubara, Daniel Soudry

The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e. g., weights, activations, and gradients).

Quantization

How do Minimum-Norm Shallow Denoisers Look in Function Space?

no code implementations NeurIPS 2023 Chen Zeno, Greg Ongie, Yaniv Blumenfeld, Nir Weinberger, Daniel Soudry

Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation.

Image Generation Image Reconstruction

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks

no code implementations30 Jun 2023 Mor Shpigel Nacson, Rotem Mulayoff, Greg Ongie, Tomer Michaeli, Daniel Soudry

Finally, we prove that if a function is sufficiently smooth (in a Sobolev sense) then it can be approximated arbitrarily well using shallow ReLU networks that correspond to stable solutions of gradient descent.

Explore to Generalize in Zero-Shot RL

1 code implementation NeurIPS 2023 Ev Zisselman, Itai Lavie, Daniel Soudry, Aviv Tamar

Our insight is that learning a policy that effectively $\textit{explores}$ the domain is harder to memorize than a policy that maximizes reward for a specific task, and therefore we expect such learned behavior to generalize well; we indeed demonstrate this empirically on several domains that are difficult for invariance-based approaches.

Zero-shot Generalization

Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations

1 code implementation CVPR 2023 Hagay Michaeli, Tomer Michaeli, Daniel Soudry

Although CNNs are believed to be invariant to translations, recent works have shown this is not the case, due to aliasing effects that stem from downsampling layers.

The Role of Codeword-to-Class Assignments in Error-Correcting Codes: An Empirical Study

no code implementations10 Feb 2023 Itay Evron, Ophir Onn, Tamar Weiss Orzech, Hai Azeroual, Daniel Soudry

Error-correcting codes (ECC) are used to reduce multiclass classification tasks to multiple binary classification subproblems.

Binary Classification Classification

How catastrophic can catastrophic forgetting be in linear regression?

no code implementations19 May 2022 Itay Evron, Edward Moroshko, Rachel Ward, Nati Srebro, Daniel Soudry

In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas.

Continual Learning regression

Optimal Fine-Grained N:M sparsity for Activations and Neural Gradients

1 code implementation21 Mar 2022 Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry

We show that while minimization of the MSE works fine for pruning the activations, it catastrophically fails for the neural gradients.

Logarithmic Unbiased Quantization: Simple 4-bit Training in Deep Learning

no code implementations19 Dec 2021 Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry

Based on this, we suggest a \textit{logarithmic unbiased quantization} (LUQ) method to quantize all both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training without overhead.

Quantization

The Implicit Bias of Minima Stability: A View from Function Space

no code implementations NeurIPS 2021 Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

First, we extend the existing knowledge on minima stability to non-differentiable minima, which are common in ReLU nets.

Logarithmic Unbiased Quantization: Practical 4-bit Training in Deep Learning

no code implementations29 Sep 2021 Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry

Based on this, we suggest a logarithmic unbiased quantization (LUQ) method to quantize both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training.

Quantization

Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

no code implementations24 Sep 2021 Aviv Tamar, Daniel Soudry, Ev Zisselman

In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought.

reinforcement-learning Reinforcement Learning (RL)

Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling

1 code implementation NeurIPS 2021 Niv Giladi, Zvika Ben-Haim, Sella Nevo, Yossi Matias, Daniel Soudry

Background: Floods are the most common natural disaster in the world, affecting the lives of hundreds of millions.

A statistical framework for efficient out of distribution detection in deep neural networks

no code implementations ICLR 2022 Matan Haroush, Tzviel Frostig, Ruth Heller, Daniel Soudry

Our method achieves comparable or better results than state-of-the-art methods on well-accepted OOD benchmarks, without retraining the network parameters or assuming prior knowledge on the test distribution -- and at a fraction of the computational cost.

Autonomous Vehicles Out-of-Distribution Detection +1

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

no code implementations19 Feb 2021 Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.

Inductive Bias

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

1 code implementation NeurIPS 2021 Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Seffi Naor, Daniel Soudry

Finally, to solve the problem of switching between different structure constraints, we suggest a method to convert a pre-trained model with unstructured sparsity to an N:M fine-grained block sparsity model with little to no training.

MixSize: Training Convnets With Mixed Image Sizes for Improved Accuracy, Speed and Scale Resiliency

2 code implementations1 Jan 2021 Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry

Although trained on images of a specific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.

Why Cold Posteriors? On the Suboptimal Generalization of Optimal Bayes Estimates

no code implementations pproximateinference AABI Symposium 2021 Chen Zeno, Itay Golan, Ari Pakman, Daniel Soudry

Recent works have shown that the predictive accuracy of Bayesian deep learning models exhibit substantial improvements when the posterior is raised to a 1/T power with T<1.

Task Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates

1 code implementation1 Oct 2020 Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry

The optimal Bayesian solution for this requires an intractable online Bayes update to the weights posterior.

Continual Learning

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

no code implementations NeurIPS 2020 Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".

General Classification

Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?

1 code implementation ICML 2020 Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry

Deep neural networks are typically initialized with random weights, with variances chosen to facilitate signal propagation and stable gradients.

Neural gradients are near-lognormal: improved quantized and sparse training

no code implementations ICLR 2021 Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry

While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations.

Neural Network Compression Quantization

Kernel and Rich Regimes in Overparametrized Models

1 code implementation20 Feb 2020 Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

Training of Quantized Deep Neural Networks using a Magnetic Tunnel Junction-Based Synapse

no code implementations29 Dec 2019 Tzofnat Greenberg Toledo, Ben Perach, Itay Hubara, Daniel Soudry, Shahar Kvatinsky

A recent example is the GXNOR framework for stochastic training of ternary (TNN) and binary (BNN) neural networks.

Is Feature Diversity Necessary in Neural Network Initialization?

1 code implementation11 Dec 2019 Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry

Standard practice in training neural networks involves initializing the weights in an independent fashion.

The Knowledge Within: Methods for Data-Free Model Compression

no code implementations CVPR 2020 Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry

Then, we demonstrate how these samples can be used to calibrate and fine-tune quantized models without using any real data in the process.

Model Compression

Post training 4-bit quantization of convolutional networks for rapid-deployment

1 code implementation NeurIPS 2019 Ron Banner, Yury Nahshan, Daniel Soudry

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.

Quantization

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

no code implementations ICLR 2020 Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro

In this paper, we characterize the norm required to realize a function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency

2 code implementations12 Aug 2019 Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry

Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.

Kernel and Rich Regimes in Overparametrized Models

1 code implementation13 Jun 2019 Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro

A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.

A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off

1 code implementation NeurIPS 2019 Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry

Reducing the precision of weights and activation functions in neural network training, with minimal impact on performance, is essential for the deployment of these models in resource-constrained environments.

Quantization

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

no code implementations17 May 2019 Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.

ACIQ: Analytical Clipping for Integer Quantization of neural networks

1 code implementation ICLR 2019 Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry

We analyze the trade-off between quantization noise and clipping distortion in low precision networks.

Quantization

How do infinite width bounded norm networks look in function space?

no code implementations13 Feb 2019 Pedro Savarese, Itay Evron, Daniel Soudry, Nathan Srebro

We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function.

Augment your batch: better training with larger batches

1 code implementation27 Jan 2019 Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry

We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets.

Post-training 4-bit quantization of convolution networks for rapid-deployment

2 code implementations2 Oct 2018 Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.

Quantization

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

no code implementations5 Jun 2018 Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry

We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.

Implicit Bias of Gradient Descent on Linear Convolutional Networks

no code implementations NeurIPS 2018 Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain.

Scalable Methods for 8-bit Training of Neural Networks

3 code implementations NeurIPS 2018 Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry

Armed with this knowledge, we quantize the model parameters, activations and layer gradients to 8-bit, leaving at a higher precision only the final step in the computation of the weight gradients.

Quantization

The Global Optimization Geometry of Shallow Linear Neural Networks

no code implementations13 May 2018 Zhihui Zhu, Daniel Soudry, Yonina C. Eldar, Michael B. Wakin

We examine the squared error loss landscape of shallow linear neural networks.

Task Agnostic Continual Learning Using Online Variational Bayes

2 code implementations27 Mar 2018 Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry

However, research for scenarios in which task boundaries are unknown during training has been lacking.

Continual Learning

Convergence of Gradient Descent on Separable Data

no code implementations5 Mar 2018 Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.

Norm matters: efficient and accurate normalization schemes in deep networks

4 code implementations NeurIPS 2018 Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications.

Characterizing Implicit Bias in Terms of Optimization Geometry

no code implementations ICML 2018 Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

We study the implicit bias of generic optimization methods, such as mirror descent, natural gradient descent, and steepest descent with respect to different potentials and norms, when optimizing underdetermined linear regression or separable linear classification problems.

General Classification regression

On the Blindspots of Convolutional Networks

no code implementations14 Feb 2018 Elad Hoffer, Shai Fine, Daniel Soudry

Deep convolutional network has been the state-of-the-art approach for a wide variety of tasks over the last few years.

The Implicit Bias of Gradient Descent on Separable Data

2 code implementations ICLR 2018 Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

1 code implementation NeurIPS 2017 Elad Hoffer, Itay Hubara, Daniel Soudry

Following this hypothesis we conducted experiments to show empirically that the "generalization gap" stems from the relatively small number of updates rather than the batch size, and can be completely eliminated by adapting the training regime used.

Exponentially vanishing sub-optimal local minima in multilayer neural networks

1 code implementation ICLR 2018 Daniel Soudry, Elad Hoffer

We prove that, with high probability in the limit of $N\rightarrow\infty$ datapoints, the volume of differentiable regions of the empiric loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given standard normal input of dimension $d_{0}=\tilde{\Omega}\left(\sqrt{N}\right)$, and a more realistic number of $d_{1}=\tilde{\Omega}\left(N/d_{0}\right)$ hidden units.

Binary Classification

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

5 code implementations22 Sep 2016 Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits.

No bad local minima: Data independent training error guarantees for multilayer neural networks

no code implementations26 May 2016 Daniel Soudry, Yair Carmon

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima.

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

26 code implementations9 Feb 2016 Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time.

Binarized Neural Networks

2 code implementations NeurIPS 2016 Itay Hubara, Daniel Soudry, Ran El Yaniv

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time and when computing the parameters' gradient at train-time.

Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights

2 code implementations NeurIPS 2014 Daniel Soudry, Itay Hubara, Ron Meir

Using online EP and the central limit theorem we find an analytical approximation to the Bayes update of this posterior, as well as the resulting Bayes estimates of the weights and outputs.

Binary text classification text-classification +1

A structured matrix factorization framework for large scale calcium imaging data analysis

11 code implementations9 Sep 2014 Eftychios A. Pnevmatikakis, Yuanjun Gao, Daniel Soudry, David Pfau, Clay Lacefield, Kira Poskanzer, Randy Bruno, Rafael Yuste, Liam Paninski

We present a structured matrix factorization approach to analyzing calcium imaging recordings of large neuronal ensembles.

Neurons and Cognition Quantitative Methods Applications

Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter

no code implementations NeurIPS 2012 Dmitri B. Chklovskii, Daniel Soudry

If noise-shaping were used in neurons, it would introduce correlations in spike timing to reduce low-frequency (up to Nyquist) transmission error at the cost of high-frequency one (from Nyquist to sampling rate).

Cannot find the paper you are looking for? You can Submit a new open access paper.