no code implementations • 14 Feb 2021 • Urmish Thakker, Paul N. Whatmough, ZhiGang Liu, Matthew Mattina, Jesse Beu
Additionally, results with doped kronecker product matrices demonstrate state-of-the-art accuracy at large compression factors (10 - 25x) across 4 natural language processing applications with minor loss in accuracy.
no code implementations • EMNLP (sustainlp) 2020 • Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
We evaluate the impact of this technique on 5 NLP benchmarks across multiple tasks (Translation, Intent Detection, Language Modeling) and show that for similar accuracy values and compression factors, HMF can achieve more than 2. 32x faster inference run-time than pruning and 16. 77% better accuracy than LMF.
no code implementations • 3 Aug 2020 • Dibakar Gope, Jesse Beu, Matthew Mattina
While existing SIMD matrix multiplication instructions for symmetric bit-width operands can support operands of mixed precision by zero- or sign-extending the narrow operand to match the size of the other operands, they cannot exploit the benefit of narrow bit-width of one of the operands.
no code implementations • 24 Jan 2020 • Urmish Thakker, Paul N. Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu
Kronecker Products (KP) have been used to compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods.
no code implementations • 4 Nov 2019 • Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina
Using this proposed quantization method, we quantized a substantial portion of weight filters of MobileNets to ternary values resulting in 27. 98% savings in energy, and a 51. 07% reduction in the model size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.
no code implementations • 4 Oct 2019 • Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, Matthew Mattina
This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP).
no code implementations • 12 Jun 2019 • Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints.
no code implementations • 7 Jun 2019 • Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, Matthew Mattina
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy.
no code implementations • 4 Mar 2019 • Partha Maji, Andrew Mundy, Ganesh Dasika, Jesse Beu, Matthew Mattina, Robert Mullins
The Winograd or Cook-Toom class of algorithms help to reduce the overall compute complexity of many modern deep convolutional neural networks (CNNs).