Search Results for author: Yoram Singer

Found 23 papers, 6 papers with code

Towards Practical Second Order Optimization for Deep Learning

no code implementations1 Jan 2021 Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.

Click-Through Rate Prediction Image Classification +3

Scalable Second Order Optimization for Deep Learning

1 code implementation20 Feb 2020 Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.

Image Classification Language Modelling +2

Proximity Preserving Binary Code using Signed Graph-Cut

no code implementations5 Feb 2020 Inbal Lav, Shai Avidan, Yoram Singer, Yacov Hel-Or

We show that the proposed approximation is superior to the commonly used spectral methods with respect to both accuracy and complexity.

graph partitioning

Memory Efficient Adaptive Optimization

1 code implementation NeurIPS 2019 Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.

Language Modelling Machine Translation +1

Convolutional Bipartite Attractor Networks

no code implementations8 Jun 2019 Michael Iuzzolino, Yoram Singer, Michael C. Mozer

In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence.

Image Denoising Imputation +1

Identity Crisis: Memorization and Generalization under Extreme Overparameterization

no code implementations ICLR 2020 Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer

We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task.

Are All Layers Created Equal?

2 code implementations ICML Workshop Deep_Phenomen 2019 Chiyuan Zhang, Samy Bengio, Yoram Singer

Morally, layers of large deep neural networks can be categorized as either "robust" or "critical".

Exponentiated Gradient Meets Gradient Descent

no code implementations5 Feb 2019 Udaya Ghai, Elad Hazan, Yoram Singer

The hypentropy has a natural spectral counterpart which we use to derive a family of matrix-based updates that bridge gradient methods and the multiplicative method for matrices.

Memory-Efficient Adaptive Optimization

3 code implementations30 Jan 2019 Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.

Language Modelling Machine Translation +1

The Well-Tempered Lasso

no code implementations ICML 2018 Yuanzhi Li, Yoram Singer

Every regression parameter in the Lasso changes linearly as a function of the regularization value.

The Well Tempered Lasso

no code implementations8 Jun 2018 Yuanzhi Li, Yoram Singer

Every regression parameter in the Lasso changes linearly as a function of the regularization value.

Shampoo: Preconditioned Stochastic Tensor Optimization

2 code implementations ICML 2018 Vineet Gupta, Tomer Koren, Yoram Singer

Preconditioned gradient methods are among the most general and powerful tools in optimization.

Stochastic Optimization

Learning a neural response metric for retinal prosthesis

no code implementations ICLR 2018 Nishal P Shah, Sasidhar Madugula, EJ Chichilnisky, Yoram Singer, Jonathon Shlens

Retinal prostheses for treating incurable blindness are designed to electrically stimulate surviving retinal neurons, causing them to send artificial visual signals to the brain.

A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

no code implementations20 Jun 2017 Vineet Gupta, Tomer Koren, Yoram Singer

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning.

Stochastic Optimization

Random Features for Compositional Kernels

no code implementations22 Mar 2017 Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer

We describe and analyze a simple random feature scheme (RFS) from prescribed compositional kernels.

Sketching and Neural Networks

no code implementations19 Apr 2016 Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar

In stark contrast, our approach of using improper learning, using a larger hypothesis class allows the sketch size to have a logarithmic dependence on the degree.

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

no code implementations NeurIPS 2016 Amit Daniely, Roy Frostig, Yoram Singer

We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning.

Train faster, generalize better: Stability of stochastic gradient descent

no code implementations3 Sep 2015 Moritz Hardt, Benjamin Recht, Yoram Singer

In the non-convex case, we give a new interpretation of common practices in neural networks, and formally show that popular techniques for training large deep models are indeed stability-promoting.

Zero-Shot Learning by Convex Combination of Semantic Embeddings

2 code implementations19 Dec 2013 Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean

In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.

Multi-label zero-shot learning

Using Web Co-occurrence Statistics for Improving Image Categorization

no code implementations19 Dec 2013 Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.

Common Sense Reasoning Image Categorization +1

The Maximum Entropy Relaxation Path

no code implementations7 Nov 2013 Moshe Dubiner, Matan Gavish, Yoram Singer

We show existence and a geometric description of the relaxation path.

Efficient Learning using Forward-Backward Splitting

no code implementations NeurIPS 2009 Yoram Singer, John C. Duchi

We derive concrete and very simple algorithms for minimization of loss functions with $\ell_1$, $\ell_2$, $\ell_2^2$, and $\ell_\infty$ regularization.

online learning

Cannot find the paper you are looking for? You can Submit a new open access paper.