no code implementations • 6 Feb 2024 • Alexander Mathiasen, Hatem Helal, Paul Balanca, Adam Krzywaniak, Ali Parviz, Frederik Hvilshøj, Blazej Banaszewski, Carlo Luschi, Andrew William Fitzgibbon
For comparison, Sch\"utt et al. (2019) spent 626 hours creating a dataset on which they trained their NN for 160h, for a total of 786h; our method achieves comparable performance within 31h.
1 code implementation • 8 Dec 2023 • Luka Ribar, Ivan Chelombiev, Luke Hudlass-Galley, Charlie Blake, Carlo Luschi, Douglas Orr
The computational difficulties of large language model (LLM) inference remain a significant obstacle to their widespread deployment.
2 code implementations • NeurIPS 2023 • Alexander Mathiasen, Hatem Helal, Kerstin Klaser, Paul Balanca, Josef Dean, Carlo Luschi, Dominique Beaini, Andrew Fitzgibbon, Dominic Masters
Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples.
no code implementations • 29 Sep 2023 • Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon
FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models.
2 code implementations • 20 Mar 2023 • Charlie Blake, Douglas Orr, Carlo Luschi
We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats.
1 code implementation • 22 Nov 2022 • Alberto Cattaneo, Daniel Justus, Harry Mellor, Douglas Orr, Jerome Maloberti, Zhenying Liu, Thorin Farnsworth, Andrew Fitzgibbon, Blazej Banaszewski, Carlo Luschi
We present the award-winning submission to the WikiKG90Mv2 track of OGB-LSC@NeurIPS 2022.
no code implementations • 6 Jun 2022 • Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi
Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training.
no code implementations • 13 Aug 2021 • Anastasia Dietrich, Frithjof Gressmann, Douglas Orr, Ivan Chelombiev, Daniel Justus, Carlo Luschi
Identifying algorithms for computational efficient unsupervised training of large language models is an important and active area of research.
no code implementations • 10 Jun 2021 • Ivan Chelombiev, Daniel Justus, Douglas Orr, Anastasia Dietrich, Frithjof Gressmann, Alexandros Koliousis, Carlo Luschi
Attention based language models have become a critical component in state-of-the-art natural language processing systems.
no code implementations • 7 Jun 2021 • Dominic Masters, Antoine Labatie, Zach Eaton-Rosen, Carlo Luschi
Much recent research has been dedicated to improving the efficiency of training and inference for image classification.
no code implementations • NeurIPS 2021 • Antoine Labatie, Dominic Masters, Zach Eaton-Rosen, Carlo Luschi
We investigate the reasons for the performance degradation incurred with batch-independent normalization.
1 code implementation • 7 Dec 2020 • Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel
Deep learning models trained on large data sets have been widely successful in both vision and language domains.
1 code implementation • NeurIPS 2020 • Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi
Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters.
3 code implementations • 20 Apr 2018 • Dominic Masters, Carlo Luschi
Modern deep neural network training is typically based on mini-batch stochastic gradient optimization.