Search Results for author: Samuel L. Smith

Found 20 papers, 8 papers with code

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

2 code implementations • 29 Feb 2024 • Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando de Freitas, Caglar Gulcehre

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.

Language Modelling

497

Paper
Code

ConvNets Match Vision Transformers at Scale

no code implementations • 25 Oct 2023 • Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De

Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale.

Paper
Add Code

Unlocking Accuracy and Fairness in Differentially Private Image Classification

2 code implementations • 21 Aug 2023 • Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.

Classification Fairness +2

Paper
Code

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

no code implementations • 21 Jul 2023 • Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling.

Computational Efficiency Position

Paper
Add Code

Differentially Private Diffusion Models Generate Useful Synthetic Images

no code implementations • 27 Feb 2023 • Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.

Image Generation Privacy Preserving

Paper
Add Code

Unlocking High-Accuracy Differentially Private Image Classification through Scale

2 code implementations • 28 Apr 2022 • Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.

Ranked #2 on Image Classification with Differential Privacy on ImageNet

Classification Image Classification with Differential Privacy +1

Paper
Code

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

no code implementations • 27 May 2021 • Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets.

Ranked #124 on Image Classification on ImageNet

Data Augmentation Image Classification

Paper
Add Code

High-Performance Large-Scale Image Recognition Without Normalization

19 code implementations • 11 Feb 2021 • Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.

Ranked #31 on Image Classification on ImageNet

Image Classification Vocal Bursts Intensity Prediction

29,758

Paper
Code

On the Origin of Implicit Regularization in Stochastic Gradient Descent

no code implementations • ICLR 2021 • Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.

Paper
Add Code

Characterizing signal propagation to close the performance gap in unnormalized ResNets

4 code implementations • ICLR 2021 • Andrew Brock, Soham De, Samuel L. Smith

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs.

29,758

Paper
Code

Cold Posteriors and Aleatoric Uncertainty

no code implementations • 31 Jul 2020 • Ben Adlam, Jasper Snoek, Samuel L. Smith

Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect).

valid

Paper
Add Code

On the Generalization Benefit of Noise in Stochastic Gradient Descent

no code implementations • ICML 2020 • Samuel L. Smith, Erich Elsen, Soham De

It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.

Paper
Add Code

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

no code implementations • NeurIPS 2020 • Soham De, Samuel L. Smith

Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks.

Paper
Add Code

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

no code implementations • 9 May 2019 • Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions.

Paper
Add Code

Stochastic natural gradient descent draws posterior samples in function space

no code implementations • 25 Jun 2018 • Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima.

valid

Paper
Add Code

Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks

1 code implementation • ICLR 2018 • Vitalii Zhelezniak, Dan Busbridge, April Shen, Samuel L. Smith, Nils Y. Hammerla

Experimental evidence indicates that simple models outperform complex deep networks on many unsupervised similarity tasks.

Sentence Sentence Embedding +1

Paper
Code

Don't Decay the Learning Rate, Increase the Batch Size

3 code implementations • ICLR 2018 • Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le

We can further reduce the number of parameter updates by increasing the learning rate $\epsilon$ and scaling the batch size $B \propto \epsilon$.

Paper
Code

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

no code implementations • 17 Oct 2017 • Samuel L. Smith, Quoc V. Le

Interpreting stochastic gradient descent as a stochastic differential equation, we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilon N/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size.

Paper
Add Code

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

6 code implementations • 13 Feb 2017 • Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla

We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian.

Translation

3,167

Paper
Code

Monte Carlo Sort for unreliable human comparisons

no code implementations • 27 Dec 2016 • Samuel L. Smith

We develop a novel sorting algorithm, where each pairwise comparison reflects a subjective human judgement about which element is bigger or better.

Marketing

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.