no code implementations • 27 Sep 2022 • Siddhartha Rao Kamalakara, Acyr Locatelli, Bharat Venkitesh, Jimmy Ba, Yarin Gal, Aidan N. Gomez
Training deep neural networks in low rank, i. e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time.
1 code implementation • 14 Jun 2022 • Sören Mindermann, Jan Brauner, Muhammed Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N. Gomez, Adrien Morisot, Sebastian Farquhar, Yarin Gal
But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable.
no code implementations • 6 Jul 2021 • Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal
We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right".
3 code implementations • NeurIPS 2021 • Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth, Yarin Gal
We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input.
no code implementations • 10 Mar 2021 • Lorenz Kuhn, Clare Lyle, Aidan N. Gomez, Jonas Rothfuss, Yarin Gal
Existing generalization measures that aim to capture a model's simplicity based on parameter counts or norms fail to explain generalization in overparameterized deep neural networks.
1 code implementation • 8 Oct 2020 • Aidan N. Gomez, Oscar Key, Kuba Perlin, Stephen Gou, Nick Frosst, Jeff Dean, Yarin Gal
Motivated by poor resource utilisation in the global setting and poor task performance in the local setting, we introduce a class of intermediary strategies between local and global learning referred to as interlocking backpropagation.
no code implementations • 21 Jul 2020 • Pascal Notin, Aidan N. Gomez, Joanna Yoo, Yarin Gal
Pushing forward the compute efficacy frontier in deep learning is critical for tasks that require frequent model re-training or workloads that entail training a large number of models.
1 code implementation • 8 Jun 2020 • Tim Z. Xiao, Aidan N. Gomez, Yarin Gal
We detect out-of-training-distribution sentences in Neural Machine Translation using the Bayesian Deep Learning equivalent of Transformer models.
1 code implementation • 22 Dec 2019 • Angelos Filos, Sebastian Farquhar, Aidan N. Gomez, Tim G. J. Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, Yarin Gal
From our comparison we conclude that some current techniques which solve benchmarks such as UCI `overfit' their uncertainty to the dataset---when evaluated on our benchmark these underperform in comparison to simpler baselines.
2 code implementations • 31 May 2019 • Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton
Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights.
1 code implementation • NIPS Workshop CDNNRIA 2018 • Aidan N. Gomez, Ivan Zhang, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton
Neural networks are extremely flexible models due to their large number of parameters, which is beneficial for learning, but also highly redundant.
14 code implementations • WS 2018 • Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit
Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.
1 code implementation • ICLR 2018 • Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser
This work details CipherGAN, an architecture inspired by CycleGAN used for inferring the underlying cipher mapping given banks of unpaired ciphertext and plaintext.
no code implementations • ICLR 2018 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
We present a single model that yields good results on a number of problems spanning multiple domains.
9 code implementations • NeurIPS 2017 • Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse
Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider.
1 code implementation • 16 Jun 2017 • Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
We present a single model that yields good results on a number of problems spanning multiple domains.
567 code implementations • NeurIPS 2017 • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.
Ranked #2 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)
2 code implementations • ICLR 2018 • Lukasz Kaiser, Aidan N. Gomez, Francois Chollet
In this work, we study how depthwise separable convolutions can be applied to neural machine translation.
Ranked #63 on Machine Translation on WMT2014 English-German