no code implementations • 15 Jan 2022 • Igor Fedorov, Ramon Matas, Hokchhay Tann, Chuteng Zhou, Matthew Mattina, Paul Whatmough
Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity.
3 code implementations • ICLR 2021 • Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N. Whatmough, Venkatesh Saligrama
We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round.
no code implementations • 13 Aug 2021 • Shyam A. Tailor, René de Jong, Tiago Azevedo, Matthew Mattina, Partha Maji
In recent years graph neural network (GNN)-based approaches have become a popular strategy for processing point cloud data, regularly achieving state-of-the-art performance on a variety of tasks.
no code implementations • 16 Jul 2021 • Zhi-Gang Liu, Paul N. Whatmough, Yuhao Zhu, Matthew Mattina
We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations.
1 code implementation • 22 Feb 2021 • Martin Ferianc, Partha Maji, Matthew Mattina, Miguel Rodrigues
Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation.
no code implementations • 14 Feb 2021 • Urmish Thakker, Paul N. Whatmough, ZhiGang Liu, Matthew Mattina, Jesse Beu
Additionally, results with doped kronecker product matrices demonstrate state-of-the-art accuracy at large compression factors (10 - 25x) across 4 natural language processing applications with minor loss in accuracy.
no code implementations • 28 Jan 2021 • Chuteng Zhou, Quntao Zhuang, Matthew Mattina, Paul N. Whatmough
Our SDPI can be applied to various information processing systems, including neural networks and cellular automata.
1 code implementation • 21 Oct 2020 • Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough
To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.
Ranked #1 on Keyword Spotting on Google Speech Commands V2 12
no code implementations • EMNLP (sustainlp) 2020 • Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
We evaluate the impact of this technique on 5 NLP benchmarks across multiple tasks (Translation, Intent Detection, Language Modeling) and show that for similar accuracy values and compression factors, HMF can achieve more than 2. 32x faster inference run-time than pruning and 16. 77% better accuracy than LMF.
1 code implementation • 7 Sep 2020 • Tiago Azevedo, René de Jong, Matthew Mattina, Partha Maji
In this paper, we adapt the well-established YOLOv3 architecture to generate uncertainty estimations by introducing stochasticity in the form of Monte Carlo Dropout (MC-Drop), and evaluate it across different levels of dataset shift.
no code implementations • 4 Sep 2020 • Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina
In this paper, we address a key architectural challenge with structural sparsity: how to provide support for a range of sparsity levels while maintaining high utilization of the hardware.
no code implementations • 3 Aug 2020 • Dibakar Gope, Jesse Beu, Matthew Mattina
While existing SIMD matrix multiplication instructions for symmetric bit-width operands can support operands of mixed precision by zero- or sign-extending the narrow operand to match the size of the other operands, they cannot exploit the benefit of narrow bit-width of one of the operands.
no code implementations • ECCV 2020 • Zhi-Gang Liu, Matthew Mattina
Prior research has shown that Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point.
1 code implementation • 20 May 2020 • Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough
Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs).
no code implementations • 16 May 2020 • Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina
Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM).
1 code implementation • 25 Feb 2020 • Javier Fernandez-Marques, Paul N. Whatmough, Andrew Mundy, Matthew Mattina
Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices.
no code implementations • 24 Jan 2020 • Urmish Thakker, Paul N. Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu
Kronecker Products (KP) have been used to compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods.
no code implementations • 14 Jan 2020 • Chuteng Zhou, Prad Kadambi, Matthew Mattina, Paul N. Whatmough
Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning.
no code implementations • 18 Nov 2019 • Patrick Hansen, Alexey Vilkin, Yury Khrustalev, James Imber, David Hanwell, Matthew Mattina, Paul N. Whatmough
In this work, we investigate the efficacy of the ISP in CNN classification tasks, and outline the system-level trade-offs between prediction accuracy and computational cost.
no code implementations • 4 Nov 2019 • Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina
Using this proposed quantization method, we quantized a substantial portion of weight filters of MobileNets to ternary values resulting in 27. 98% savings in energy, and a 51. 07% reduction in the model size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.
no code implementations • 4 Oct 2019 • Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, Matthew Mattina
This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP).
no code implementations • 12 Jun 2019 • Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints.
no code implementations • 7 Jun 2019 • Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, Matthew Mattina
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy.
no code implementations • NeurIPS 2019 • Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul N. Whatmough
The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment.
no code implementations • 4 Mar 2019 • Partha Maji, Andrew Mundy, Ganesh Dasika, Jesse Beu, Matthew Mattina, Robert Mullins
The Winograd or Cook-Toom class of algorithms help to reduce the overall compute complexity of many modern deep convolutional neural networks (CNNs).
no code implementations • 4 Mar 2019 • Dibakar Gope, Ganesh Dasika, Matthew Mattina
Machine learning-based applications are increasingly prevalent in IoT devices.
no code implementations • 4 Mar 2019 • Zhi-Gang Liu, Matthew Mattina
The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding.
1 code implementation • 27 Feb 2019 • Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, Matthew Mattina
Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1\%$ resulting in up to 11. 2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area.
no code implementations • 5 Dec 2018 • Franz Pernkopf, Wolfgang Roth, Matthias Zoehrer, Lukas Pfeifenberger, Guenther Schindler, Holger Froening, Sebastian Tschiatschek, Robert Peharz, Matthew Mattina, Zoubin Ghahramani
In that way, we provide an extensive overview of the current state-of-the-art of robust and efficient machine learning for real-world systems.
no code implementations • 4 Dec 2018 • Paul Whatmough, Chuteng Zhou, Patrick Hansen, Matthew Mattina
On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices.
9 code implementations • 16 Oct 2018 • Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna
Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications.
Distributed, Parallel, and Cluster Computing Hardware Architecture
no code implementations • 29 Mar 2018 • Yuhao Zhu, Anand Samajdar, Matthew Mattina, Paul Whatmough
Specifically, we propose to expose the motion data that is naturally generated by the Image Signal Processor (ISP) early in the vision pipeline to the CNN engine.
no code implementations • 19 Jan 2018 • Yuhao Zhu, Matthew Mattina, Paul Whatmough
Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc.