Search Results for author: Tim Dettmers

Found 8 papers, 6 papers with code

8-bit Optimizers via Block-wise Quantization

2 code implementations6 Oct 2021 Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer

To maintain stability and performance, we combine block-wise quantization with two additional changes: (1) dynamic quantization, a form of non-linear optimization that is precise for both large and small magnitude values, and (2) a stable embedding layer to reduce gradient variance that comes from the highly non-uniform distribution of input tokens in language models.

Language Modelling Machine Translation +1

BASE Layers: Simplifying Training of Large, Sparse Models

1 code implementation30 Mar 2021 Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal, Luke Zettlemoyer

Sparse layers can dramatically improve the efficiency of training and inference by routing each token to specialized expert modules that contain only a small fraction of the model parameters.

Sparse Networks from Scratch: Faster Training without Losing Performance

2 code implementations ICLR 2020 Tim Dettmers, Luke Zettlemoyer

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels.

Image Classification Sparse Learning

Jack the Reader -- A Machine Reading Framework

1 code implementation ACL 2018 Dirk Weissenborn, Pasquale Minervini, Isabelle Augenstein, Johannes Welbl, Tim Rockt{\"a}schel, Matko Bo{\v{s}}njak, Jeff Mitchell, Thomas Demeester, Tim Dettmers, Pontus Stenetorp, Sebastian Riedel

For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions.

Information Retrieval Language understanding +5

Jack the Reader - A Machine Reading Framework

2 code implementations20 Jun 2018 Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle Augenstein, Johannes Welbl, Tim Rocktäschel, Matko Bošnjak, Jeff Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel

For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions.

Language understanding Link Prediction +4

Convolutional 2D Knowledge Graph Embeddings

5 code implementations5 Jul 2017 Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

In this work, we introduce ConvE, a multi-layer convolutional network model for link prediction, and report state-of-the-art results for several established datasets.

Knowledge Graph Embeddings Knowledge Graphs +1

8-Bit Approximations for Parallelism in Deep Learning

no code implementations14 Nov 2015 Tim Dettmers

We show that these approximations do not decrease predictive performance on MNIST, CIFAR10, and ImageNet for both model and data parallelism and provide a data transfer speedup of 2x relative to 32-bit parallelism.

Cannot find the paper you are looking for? You can Submit a new open access paper.