Search Results for author: Jeremy Bernstein

Found 17 papers, 13 papers with code

Training Neural Networks from Scratch with Parallel Low-Rank Adapters

no code implementations26 Feb 2024 Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal

The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication.

A Spectral Condition for Feature Learning

no code implementations26 Oct 2023 Greg Yang, James B. Simon, Jeremy Bernstein

The push to train ever larger neural networks has motivated the study of initialization and training at large network width.

SketchOGD: Memory-Efficient Continual Learning

1 code implementation25 May 2023 Benjamin Wright, Youngjae Min, Jeremy Bernstein, Navid Azizan

This paper proposes a memory-efficient solution to catastrophic forgetting, improving upon an established algorithm known as orthogonal gradient descent (OGD).

Continual Learning

Automatic Gradient Descent: Deep Learning without Hyperparameters

1 code implementation11 Apr 2023 Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue

Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale.

Second-order methods

Optimisation & Generalisation in Networks of Neurons

1 code implementation18 Oct 2022 Jeremy Bernstein

On generalisation, a new correspondence is proposed between ensembles of networks and individual networks.

Investigating Generalization by Controlling Normalized Margin

1 code implementation8 May 2022 Alexander R. Farhang, Jeremy Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue

Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$.

Learning Theory

Kernel Interpolation as a Bayes Point Machine

1 code implementation8 Oct 2021 Jeremy Bernstein, Alex Farhang, Yisong Yue

A Bayes point machine is a single classifier that approximates the majority decision of an ensemble of classifiers.

Bayesian Inference

On the Implicit Biases of Architecture & Gradient Descent

no code implementations29 Sep 2021 Jeremy Bernstein, Yisong Yue

Do neural networks generalise because of bias in the functions returned by gradient descent, or bias already present in the network architecture?

Bayesian Inference

Fine-Grained System Identification of Nonlinear Neural Circuits

1 code implementation9 Jun 2021 Dawna Bagherian, James Gornet, Jeremy Bernstein, Yu-Li Ni, Yisong Yue, Markus Meister

We study the problem of sparse nonlinear model recovery of high dimensional compositional functions.

Computing the Information Content of Trained Neural Networks

1 code implementation1 Mar 2021 Jeremy Bernstein, Yisong Yue

A simple resolution to this conundrum is that the number of weights is usually a bad proxy for the actual amount of information stored.

Learning by Turning: Neural Architecture Aware Optimisation

2 code implementations14 Feb 2021 Yang Liu, Jeremy Bernstein, Markus Meister, Yisong Yue

To address this problem, this paper conducts a combined study of neural architecture and optimisation, leading to a new optimiser called Nero: the neuronal rotator.

Learning compositional functions via multiplicative weight updates

1 code implementation NeurIPS 2020 Jeremy Bernstein, Jia-Wei Zhao, Markus Meister, Ming-Yu Liu, Anima Anandkumar, Yisong Yue

This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions.

LEMMA

On the distance between two neural networks and the stability of learning

2 code implementations NeurIPS 2020 Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu

This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.

LEMMA

signSGD with Majority Vote is Communication Efficient And Fault Tolerant

3 code implementations ICLR 2019 Jeremy Bernstein, Jia-Wei Zhao, Kamyar Azizzadenesheli, Anima Anandkumar

Workers transmit only the sign of their gradient vector to a server, and the overall update is decided by a majority vote.

Benchmarking

signSGD: Compressed Optimisation for Non-Convex Problems

3 code implementations ICML 2018 Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar

Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD.

Cannot find the paper you are looking for? You can Submit a new open access paper.