Search Results for author: Carlo Luschi

Found 14 papers, 7 papers with code

Reducing the Cost of Quantum Chemical Data By Backpropagating Through Density Functional Theory

no code implementations • 6 Feb 2024 • Alexander Mathiasen, Hatem Helal, Paul Balanca, Adam Krzywaniak, Ali Parviz, Frederik Hvilshøj, Blazej Banaszewski, Carlo Luschi, Andrew William Fitzgibbon

For comparison, Sch\"utt et al. (2019) spent 626 hours creating a dataset on which they trained their NN for 160h, for a total of 786h; our method achieves comparable performance within 31h.

Paper
Add Code

SparQ Attention: Bandwidth-Efficient LLM Inference

1 code implementation • 8 Dec 2023 • Luka Ribar, Ivan Chelombiev, Luke Hudlass-Galley, Charlie Blake, Carlo Luschi, Douglas Orr

The computational difficulties of large language model (LLM) inference remain a significant obstacle to their widespread deployment.

Language Modelling Large Language Model

Paper
Code

Generating QM1B with PySCF$_{\text{IPU}}$

2 code implementations • NeurIPS 2023 • Alexander Mathiasen, Hatem Helal, Kerstin Klaser, Paul Balanca, Josef Dean, Carlo Luschi, Dominique Beaini, Andrew Fitzgibbon, Dominic Masters

Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples.

Paper
Code

Training and inference of large language models using 8-bit floating point

no code implementations • 29 Sep 2023 • Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon

FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models.

Computational Efficiency

Paper
Add Code

Unit Scaling: Out-of-the-Box Low-Precision Training

2 code implementations • 20 Mar 2023 • Charlie Blake, Douglas Orr, Carlo Luschi

We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats.

Paper
Code

BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion

1 code implementation • 22 Nov 2022 • Alberto Cattaneo, Daniel Justus, Harry Mellor, Douglas Orr, Jerome Maloberti, Zhenying Liu, Thorin Farnsworth, Andrew Fitzgibbon, Blazej Banaszewski, Carlo Luschi

We present the award-winning submission to the WikiKG90Mv2 track of OGB-LSC@NeurIPS 2022.

Knowledge Graph Completion Knowledge Graph Embedding +1

Paper
Code

8-bit Numerical Formats for Deep Neural Networks

no code implementations • 6 Jun 2022 • Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training.

Computational Efficiency Image Classification

Paper
Add Code

Towards Structured Dynamic Sparse Pre-Training of BERT

no code implementations • 13 Aug 2021 • Anastasia Dietrich, Frithjof Gressmann, Douglas Orr, Ivan Chelombiev, Daniel Justus, Carlo Luschi

Identifying algorithms for computational efficient unsupervised training of large language models is an important and active area of research.

Language Modelling

Paper
Add Code

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

no code implementations • 10 Jun 2021 • Ivan Chelombiev, Daniel Justus, Douglas Orr, Anastasia Dietrich, Frithjof Gressmann, Alexandros Koliousis, Carlo Luschi

Attention based language models have become a critical component in state-of-the-art natural language processing systems.

Representation Learning

Paper
Add Code

Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training

no code implementations • 7 Jun 2021 • Dominic Masters, Antoine Labatie, Zach Eaton-Rosen, Carlo Luschi

Much recent research has been dedicated to improving the efficiency of training and inference for image classification.

Image Classification

Paper
Add Code

Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence

no code implementations • NeurIPS 2021 • Antoine Labatie, Dominic Masters, Zach Eaton-Rosen, Carlo Luschi

We investigate the reasons for the performance degradation incurred with batch-independent normalization.

Paper
Add Code

Parallel Training of Deep Networks with Local Updates

1 code implementation • 7 Dec 2020 • Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

Deep learning models trained on large data sets have been widely successful in both vision and language domains.

Paper
Code

Improving Neural Network Training in Low Dimensional Random Bases

1 code implementation • NeurIPS 2020 • Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi

Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters.

Paper
Code

Revisiting Small Batch Training for Deep Neural Networks

3 code implementations • 20 Apr 2018 • Dominic Masters, Carlo Luschi

Modern deep neural network training is typically based on mini-batch stochastic gradient optimization.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.