3 code implementations • 29 Feb 2024 • Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando de Freitas, Caglar Gulcehre
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.
no code implementations • 25 Oct 2023 • Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De
Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale.
2 code implementations • 21 Aug 2023 • Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle
The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.
no code implementations • 21 Jul 2023 • Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith
Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling.
no code implementations • 27 Feb 2023 • Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle
By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.
2 code implementations • 28 Apr 2022 • Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle
Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.
Classification Image Classification with Differential Privacy +1
no code implementations • 27 May 2021 • Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith
In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets.
Ranked #124 on Image Classification on ImageNet
19 code implementations • 11 Feb 2021 • Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.
Ranked #30 on Image Classification on ImageNet
no code implementations • ICLR 2021 • Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De
To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.
4 code implementations • ICLR 2021 • Andrew Brock, Soham De, Samuel L. Smith
Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs.
no code implementations • 31 Jul 2020 • Ben Adlam, Jasper Snoek, Samuel L. Smith
Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect).
no code implementations • ICML 2020 • Samuel L. Smith, Erich Elsen, Soham De
It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.
no code implementations • NeurIPS 2020 • Soham De, Samuel L. Smith
Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks.
no code implementations • 9 May 2019 • Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith
We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions.
no code implementations • 25 Jun 2018 • Samuel L. Smith, Daniel Duckworth, Semon Rezchikov, Quoc V. Le, Jascha Sohl-Dickstein
Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima.
1 code implementation • ICLR 2018 • Vitalii Zhelezniak, Dan Busbridge, April Shen, Samuel L. Smith, Nils Y. Hammerla
Experimental evidence indicates that simple models outperform complex deep networks on many unsupervised similarity tasks.
3 code implementations • ICLR 2018 • Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le
We can further reduce the number of parameter updates by increasing the learning rate $\epsilon$ and scaling the batch size $B \propto \epsilon$.
no code implementations • 17 Oct 2017 • Samuel L. Smith, Quoc V. Le
Interpreting stochastic gradient descent as a stochastic differential equation, we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilon N/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size.
6 code implementations • 13 Feb 2017 • Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla
We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian.
no code implementations • 27 Dec 2016 • Samuel L. Smith
We develop a novel sorting algorithm, where each pairwise comparison reflects a subjective human judgement about which element is bigger or better.