Search Results for author: Matthew Mattina

Found 33 papers, 8 papers with code

UDC: Unified DNAS for Compressible TinyML Models

no code implementations15 Jan 2022 Igor Fedorov, Ramon Matas, Hokchhay Tann, Chuteng Zhou, Matthew Mattina, Paul Whatmough

Emerging Internet-of-things (IoT) applications are driving deployment of neural networks (NNs) on heavily constrained low-cost hardware (HW) platforms, where accuracy is typically limited by memory capacity.

Model Compression Quantization +1

Federated Learning Based on Dynamic Regularization

1 code implementation ICLR 2021 Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N. Whatmough, Venkatesh Saligrama

We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round.

Federated Learning

Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification

no code implementations13 Aug 2021 Shyam A. Tailor, René de Jong, Tiago Azevedo, Matthew Mattina, Partha Maji

In recent years graph neural network (GNN)-based approaches have become a popular strategy for processing point cloud data, regularly achieving state-of-the-art performance on a variety of tasks.

Mixed Reality

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

no code implementations16 Jul 2021 Zhi-Gang Liu, Paul N. Whatmough, Yuhao Zhu, Matthew Mattina

We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations.

On the Effects of Quantisation on Model Uncertainty in Bayesian Neural Networks

1 code implementation22 Feb 2021 Martin Ferianc, Partha Maji, Matthew Mattina, Miguel Rodrigues

Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation.

Autonomous Driving Decision Making

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices

no code implementations14 Feb 2021 Urmish Thakker, Paul N. Whatmough, ZhiGang Liu, Matthew Mattina, Jesse Beu

Additionally, results with doped kronecker product matrices demonstrate state-of-the-art accuracy at large compression factors (10 - 25x) across 4 natural language processing applications with minor loss in accuracy.

Information contraction in noisy binary neural networks and its implications

no code implementations28 Jan 2021 Chuteng Zhou, Quntao Zhuang, Matthew Mattina, Paul N. Whatmough

Our SDPI can be applied to various information processing systems, including neural networks and cellular automata.

Image Classification Object Detection

Rank and run-time aware compression of NLP Applications

no code implementations EMNLP (sustainlp) 2020 Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina

We evaluate the impact of this technique on 5 NLP benchmarks across multiple tasks (Translation, Intent Detection, Language Modeling) and show that for similar accuracy values and compression factors, HMF can achieve more than 2. 32x faster inference run-time than pruning and 16. 77% better accuracy than LMF.

Intent Detection Language Modelling +1

Stochastic-YOLO: Efficient Probabilistic Object Detection under Dataset Shifts

1 code implementation7 Sep 2020 Tiago Azevedo, René de Jong, Matthew Mattina, Partha Maji

In this paper, we adapt the well-established YOLOv3 architecture to generate uncertainty estimations by introducing stochasticity in the form of Monte Carlo Dropout (MC-Drop), and evaluate it across different levels of dataset shift.

Image Classification Object Detection

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

no code implementations4 Sep 2020 Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina

In this paper, we address a key architectural challenge with structural sparsity: how to provide support for a range of sparsity levels while maintaining high utilization of the hardware.

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands

no code implementations3 Aug 2020 Dibakar Gope, Jesse Beu, Matthew Mattina

While existing SIMD matrix multiplication instructions for symmetric bit-width operands can support operands of mixed precision by zero- or sign-extending the narrow operand to match the size of the other operands, they cannot exploit the benefit of narrow bit-width of one of the operands.

Efficient Residue Number System Based Winograd Convolution

no code implementations ECCV 2020 Zhi-Gang Liu, Matthew Mattina

Prior research has shown that Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point.

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

1 code implementation20 May 2020 Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough

Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs).

Model Compression Quantization +1

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

no code implementations16 May 2020 Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM).

Searching for Winograd-aware Quantized Networks

1 code implementation25 Feb 2020 Javier Fernandez-Marques, Paul N. Whatmough, Andrew Mundy, Matthew Mattina

Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices.

Neural Architecture Search Quantization

Compressing Language Models using Doped Kronecker Products

no code implementations24 Jan 2020 Urmish Thakker, Paul N. Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu

Kronecker Products (KP) have been used to compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods.

Language Modelling

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation

no code implementations14 Jan 2020 Chuteng Zhou, Prad Kadambi, Matthew Mattina, Paul N. Whatmough

Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning.

Knowledge Distillation

ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems

no code implementations18 Nov 2019 Patrick Hansen, Alexey Vilkin, Yury Khrustalev, James Imber, David Hanwell, Matthew Mattina, Paul N. Whatmough

In this work, we investigate the efficacy of the ISP in CNN classification tasks, and outline the system-level trade-offs between prediction accuracy and computational cost.

Ternary MobileNets via Per-Layer Hybrid Filter Banks

no code implementations4 Nov 2019 Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina

Using this proposed quantization method, we quantized a substantial portion of weight filters of MobileNets to ternary values resulting in 27. 98% savings in energy, and a 51. 07% reduction in the model size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.

Quantization

Pushing the limits of RNN Compression

no code implementations4 Oct 2019 Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, Matthew Mattina

This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP).

Run-Time Efficient RNN Compression for Inference on Edge Devices

no code implementations12 Jun 2019 Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints.

Edge-computing

Compressing RNNs for IoT devices by 15-38x using Kronecker Products

no code implementations7 Jun 2019 Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, Matthew Mattina

Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy.

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

no code implementations NeurIPS 2019 Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul N. Whatmough

The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment.

Neural Architecture Search

Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs

no code implementations4 Mar 2019 Partha Maji, Andrew Mundy, Ganesh Dasika, Jesse Beu, Matthew Mattina, Robert Mullins

The Winograd or Cook-Toom class of algorithms help to reduce the overall compute complexity of many modern deep convolutional neural networks (CNNs).

Learning low-precision neural networks without Straight-Through Estimator(STE)

no code implementations4 Mar 2019 Zhi-Gang Liu, Matthew Mattina

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding.

Quantization

FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning

1 code implementation27 Feb 2019 Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, Matthew Mattina

Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1\%$ resulting in up to 11. 2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area.

General Classification Image Classification +1

Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning

no code implementations4 Dec 2018 Paul Whatmough, Chuteng Zhou, Patrick Hansen, Matthew Mattina

On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices.

Image Classification Transfer Learning

SCALE-Sim: Systolic CNN Accelerator

8 code implementations16 Oct 2018 Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna

Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications.

Distributed, Parallel, and Cluster Computing Hardware Architecture

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision

no code implementations29 Mar 2018 Yuhao Zhu, Anand Samajdar, Matthew Mattina, Paul Whatmough

Specifically, we propose to expose the motion data that is naturally generated by the Image Signal Processor (ISP) early in the vision pipeline to the CNN engine.

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

no code implementations19 Jan 2018 Yuhao Zhu, Matthew Mattina, Paul Whatmough

Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc.

Cannot find the paper you are looking for? You can Submit a new open access paper.