Search Results for author: Samuel L. Smith

Found 20 papers, 8 papers with code

ConvNets Match Vision Transformers at Scale

no code implementations25 Oct 2023 Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De

Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale.

Unlocking Accuracy and Fairness in Differentially Private Image Classification

2 code implementations21 Aug 2023 Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.

Classification Fairness +2

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

no code implementations21 Jul 2023 Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling.

Computational Efficiency Position

Differentially Private Diffusion Models Generate Useful Synthetic Images

no code implementations27 Feb 2023 Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.

Image Generation Privacy Preserving

Unlocking High-Accuracy Differentially Private Image Classification through Scale

2 code implementations28 Apr 2022 Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.

Classification Image Classification with Differential Privacy +1

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

no code implementations27 May 2021 Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets.

Data Augmentation Image Classification

High-Performance Large-Scale Image Recognition Without Normalization

19 code implementations11 Feb 2021 Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.

Image Classification Vocal Bursts Intensity Prediction

On the Origin of Implicit Regularization in Stochastic Gradient Descent

no code implementations ICLR 2021 Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.

Characterizing signal propagation to close the performance gap in unnormalized ResNets

4 code implementations ICLR 2021 Andrew Brock, Soham De, Samuel L. Smith

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs.

Cold Posteriors and Aleatoric Uncertainty

no code implementations31 Jul 2020 Ben Adlam, Jasper Snoek, Samuel L. Smith

Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect).

valid

On the Generalization Benefit of Noise in Stochastic Gradient Descent

no code implementations ICML 2020 Samuel L. Smith, Erich Elsen, Soham De

It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

no code implementations NeurIPS 2020 Soham De, Samuel L. Smith

Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks.

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

no code implementations9 May 2019 Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith

We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions.

Stochastic natural gradient descent draws posterior samples in function space

no code implementations25 Jun 2018 Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein

Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima.

valid

Don't Decay the Learning Rate, Increase the Batch Size

3 code implementations ICLR 2018 Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le

We can further reduce the number of parameter updates by increasing the learning rate $\epsilon$ and scaling the batch size $B \propto \epsilon$.

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

no code implementations17 Oct 2017 Samuel L. Smith, Quoc V. Le

Interpreting stochastic gradient descent as a stochastic differential equation, we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilon N/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size.

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

6 code implementations13 Feb 2017 Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla

We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian.

Translation

Monte Carlo Sort for unreliable human comparisons

no code implementations27 Dec 2016 Samuel L. Smith

We develop a novel sorting algorithm, where each pairwise comparison reflects a subjective human judgement about which element is bigger or better.

Marketing

Cannot find the paper you are looking for? You can Submit a new open access paper.