Search Results for author: Paul N. Whatmough

Found 17 papers, 5 papers with code

FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning

1 code implementation27 Feb 2019 Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, Matthew Mattina

Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1\%$ resulting in up to 11. 2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area.

General Classification Image Classification +1

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

no code implementations NeurIPS 2019 Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul N. Whatmough

The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment.

BIG-bench Machine Learning Neural Architecture Search

ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems

no code implementations18 Nov 2019 Patrick Hansen, Alexey Vilkin, Yury Khrustalev, James Imber, David Hanwell, Matthew Mattina, Paul N. Whatmough

In this work, we investigate the efficacy of the ISP in CNN classification tasks, and outline the system-level trade-offs between prediction accuracy and computational cost.

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation

no code implementations14 Jan 2020 Chuteng Zhou, Prad Kadambi, Matthew Mattina, Paul N. Whatmough

Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning.

Knowledge Distillation

Compressing Language Models using Doped Kronecker Products

no code implementations24 Jan 2020 Urmish Thakker, Paul N. Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu

Kronecker Products (KP) have been used to compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods.

Language Modelling Large Language Model

Searching for Winograd-aware Quantized Networks

1 code implementation25 Feb 2020 Javier Fernandez-Marques, Paul N. Whatmough, Andrew Mundy, Matthew Mattina

Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices.

Neural Architecture Search Quantization

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

no code implementations16 May 2020 Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM).

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

1 code implementation20 May 2020 Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough

Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs).

Model Compression Quantization +1

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

no code implementations4 Sep 2020 Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina

In this paper, we address a key architectural challenge with structural sparsity: how to provide support for a range of sparsity levels while maintaining high utilization of the hardware.

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices

no code implementations14 Feb 2021 Urmish Thakker, Paul N. Whatmough, ZhiGang Liu, Matthew Mattina, Jesse Beu

Additionally, results with doped kronecker product matrices demonstrate state-of-the-art accuracy at large compression factors (10 - 25x) across 4 natural language processing applications with minor loss in accuracy.

Fast and Accurate: Video Enhancement using Sparse Depth

no code implementations15 Mar 2021 Yu Feng, Patrick Hansen, Paul N. Whatmough, Guoyu Lu, Yuhao Zhu

This paper presents a general framework to build fast and accurate algorithms for video enhancement tasks such as super-resolution, deblurring, and denoising.

Deblurring Denoising +4

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

no code implementations16 Jul 2021 Zhi-Gang Liu, Paul N. Whatmough, Yuhao Zhu, Matthew Mattina

We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations.

Federated Learning Based on Dynamic Regularization

3 code implementations ICLR 2021 Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N. Whatmough, Venkatesh Saligrama

We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round.

Federated Learning

AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator

no code implementations10 Nov 2021 Chuteng Zhou, Fernando Garcia Redondo, Julian Büchel, Irem Boybat, Xavier Timoneda Comas, S. R. Nandakumar, Shidhartha Das, Abu Sebastian, Manuel Le Gallo, Paul N. Whatmough

We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator, with a novel layer-serial approach to remove the cost of complex interconnects associated with a fully-pipelined design.

Keyword Spotting

Cannot find the paper you are looking for? You can Submit a new open access paper.