Search Results for author: Andreas Moshovos

Found 16 papers, 0 papers with code

Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

no code implementations28 Apr 2022 Miloš Nikolić, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Andreas Moshovos

We introduce a software-hardware co-design approach to reduce memory traffic and footprint during training with BFloat16 or FP32 boosting energy efficiency and execution time performance.

Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models

no code implementations23 Mar 2022 Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Andreas Moshovos

Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids.

Quantization

APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference

no code implementations21 Jan 2022 Alberto Delmas Lascorz, Mostafa Mahmoud, Andreas Moshovos

When integrated with a Tensorcore-based accelerator, APack boosts the speedup and energy efficiency to 1. 44X and 1. 37X respectively.

Data Compression Quantization

FPRaker: A Processing Element For Accelerating Neural Network Training

no code implementations15 Oct 2020 Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos

We demonstrate that FPRaker can be used to compose an accelerator for training and that it can improve performance and energy efficiency compared to using conventional floating-point units under ISO-compute area constraints.

Quantization

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

no code implementations1 Sep 2020 Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams.

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

no code implementations8 Feb 2020 Miloš Nikolić, Ghouthi Boukli Hacene, Ciaran Bannon, Alberto Delmas Lascorz, Matthieu Courbariaux, Yoshua Bengio, Vincent Gripon, Andreas Moshovos

Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths.

Quantization

Training CNNs faster with Dynamic Input and Kernel Downsampling

no code implementations15 Oct 2019 Zissis Poulos, Ali Nouri, Andreas Moshovos

We reduce training time in convolutional networks (CNNs) with a method that, for some of the mini-batches: a) scales down the resolution of input images via downsampling, and b) reduces the forward pass operations via pooling on the convolution filters.

Laconic Deep Learning Computing

no code implementations10 May 2018 Sayeh Sharify, Mostafa Mahmoud, Alberto Delmas Lascorz, Milos Nikolic, Andreas Moshovos

A Laconic configuration that uses a 1K-wire weight memory interface, outperforms the 2K-wire conventional accelerator by 15. 4x and is 1. 95x more energy efficient.

2k Image Classification

DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing

no code implementations17 Apr 2018 Alberto Delmas, Sayeh Sharify, Patrick Judd, Kevin Siu, Milos Nikolic, Andreas Moshovos

The per group precisions are selected statically for the weights and dynamically by hardware for the activations.

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

no code implementations9 Mar 2018 Alberto Delmas, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Andreas Moshovos

We show that, during inference with Convolutional Neural Networks (CNNs), more than 2x to $8x ineffectual work can be exposed if instead of targeting those weights and activations that are zero, we target different combinations of value stream properties.

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability

no code implementations27 Jul 2017 Alberto Delmas, Sayeh Sharify, Patrick Judd, Andreas Moshovos

Experiments on image classification CNNs show that on average across all networks studied, TRT outperforms a state-of-the-art bit-parallel accelerator by 1:90x without any loss in accuracy while it is 1:17x more energy efficient.

Image Classification

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

no code implementations23 Jun 2017 Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, Andreas Moshovos

LM can trade-off accuracy for additional improvements in execution performance and energy efficiency and compares favorably to an accelerator that targeted only activation precisions.

Image Classification

Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks

no code implementations1 Jun 2017 Alberto Delmas, Patrick Judd, Sayeh Sharify, Andreas Moshovos

Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial computation to offer performance that is proportional to the fixed-point precision of the activation values.

Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing

no code implementations29 Apr 2017 Patrick Judd, Alberto Delmas, Sayeh Sharify, Andreas Moshovos

We also present a modified organization that detects the activations that are deemed as ineffectual while fetching them from memory.

Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets

no code implementations17 Nov 2015 Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, Andreas Moshovos

A diverse set of CNNs is analyzed showing that compared to a conventional implementation using a 32-bit floating-point representation for all layers, and with less than 1% loss in relative accuracy, the data footprint required by these networks can be reduced by an average of 74% and up to 92%.

Cannot find the paper you are looking for? You can Submit a new open access paper.