Search Results for author: Paul Whatmough

Found 23 papers, 8 papers with code

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

no code implementations11 Jun 2025 Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel

Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices.

Image Generation Quantization

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

no code implementations2 Dec 2024 Marco Federici, Davide Belli, Mart van Baalen, Amir Jalalirad, Andrii Skliar, Bence Major, Markus Nagel, Paul Whatmough

Previous work has proposed to leverage natural dynamic activation sparsity in ReLU-activated LLMs to reduce effective DRAM bandwidth per token.

Language Modeling Language Modelling +1

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

no code implementations27 Nov 2024 Andrii Skliar, Ties van Rozendaal, Romain Lepert, Todor Boinovski, Mart van Baalen, Markus Nagel, Paul Whatmough, Babak Ehteshami Bejnordi

Mixture of Experts (MoE) LLMs have recently gained attention for their ability to enhance performance by selectively engaging specialized subnetworks or "experts" for each input.

GSM8K Language Modeling +3

Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters

no code implementations22 Jul 2024 Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Viswanath Ganapathy, Rafael Esteves, Shreya Kadambi, Shubhankar Borse, Paul Whatmough, Risheek Garrepalli, Mart van Baalen, Harris Teague, Markus Nagel

In this paper, we propose Sparse High Rank Adapters (SHiRA) that directly finetune 1-2% of the base model weights while leaving others unchanged, thus, resulting in a highly sparse adapter.

Sparse High Rank Adapters

no code implementations19 Jun 2024 Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Viswanath Ganapathy, Shreya Kadambi, Rafael Esteves, Shubhankar Borse, Paul Whatmough, Risheek Garrepalli, Mart van Baalen, Harris Teague, Markus Nagel

However, from a mobile deployment standpoint, we can either avoid inference overhead in the fused mode but lose the ability to switch adapters rapidly, or suffer significant (up to 30% higher) inference latency while enabling rapid switching in the unfused mode.

Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator

no code implementations14 Apr 2024 Abhishek Tyagi, Reiley Jeyapaul, Chuteng Zhu, Paul Whatmough, Yuhao Zhu

As Neural Processing Units (NPU) or accelerators are increasingly deployed in a variety of applications including safety critical applications such as autonomous vehicle, and medical imaging, it is critical to understand the fault-tolerance nature of the NPUs.

Autonomous Vehicles Navigate

GPTVQ: The Blessing of Dimensionality for LLM Quantization

no code implementations23 Feb 2024 Mart van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality.

Quantization

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

no code implementations26 Jan 2023 Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough

The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models.

Graph Neural Network

Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators

no code implementations5 Dec 2022 Abhishek Tyagi, Yiming Gan, Shaoshan Liu, Bo Yu, Paul Whatmough, Yuhao Zhu

As Deep Neural Networks (DNNs) are increasingly deployed in safety critical and privacy sensitive applications such as autonomous driving and biometric authentication, it is critical to understand the fault-tolerance nature of DNNs.

Autonomous Driving

Restructurable Activation Networks

1 code implementation17 Aug 2022 Kartikeya Bhardwaj, James Ward, Caleb Tung, Dibakar Gope, Lingchuan Meng, Igor Fedorov, Alex Chalfin, Paul Whatmough, Danny Loh

To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount of non-linearity in models to improve their hardware-awareness and efficiency.

object-detection Object Detection

Hybrid Cloud-Edge Networks for Efficient Inference

1 code implementation29 Sep 2021 Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama

The first network is a low-capacity network that can be deployed on an edge device, whereas the second is a high-capacity network deployed in the cloud.

AutoPilot: Automating SoC Design Space Exploration for SWaP Constrained Autonomous UAVs

no code implementations5 Feb 2021 Srivatsan Krishnan, Zishen Wan, Kshitij Bhardwaj, Paul Whatmough, Aleksandra Faust, Sabrina Neuman, Gu-Yeon Wei, David Brooks, Vijay Janapa Reddi

Balancing a computing system for a UAV requires considering both the cyber (e. g., sensor rate, compute performance) and physical (e. g., payload weight) characteristics that affect overall performance.

Bayesian Optimization BIG-bench Machine Learning +1

Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation

1 code implementation16 Aug 2020 Yu Feng, Boyuan Tian, Tiancheng Xu, Paul Whatmough, Yuhao Zhu

Point cloud analytics is poised to become a key workload on battery-powered embedded and mobile platforms in a wide range of emerging application domains, such as autonomous driving, robotics, and augmented reality, where efficiency is paramount.

Autonomous Driving

CHIPKIT: An agile, reusable open-source framework for rapid test chip development

2 code implementations13 Jan 2020 Paul Whatmough, Marco Donato, Glenn Ko, Sae-Kyu Lee, David Brooks, Gu-Yeon Wei

The current trend for domain-specific architectures (DSAs) has led to renewed interest in research test chips to demonstrate new specialized hardware.

Hardware Architecture

ASV: Accelerated Stereo Vision System

2 code implementations15 Nov 2019 Yu Feng, Paul Whatmough, Yuhao Zhu

The key to ASV is to exploit unique characteristics inherent to stereo vision, and apply stereo-specific optimizations, both algorithmically and computationally.

Stereo Matching

Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning

no code implementations4 Dec 2018 Paul Whatmough, Chuteng Zhou, Patrick Hansen, Matthew Mattina

On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices.

image-classification Image Classification +1

SCALE-Sim: Systolic CNN Accelerator

9 code implementations16 Oct 2018 Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna

Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications.

Distributed, Parallel, and Cluster Computing Hardware Architecture

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision

no code implementations29 Mar 2018 Yuhao Zhu, Anand Samajdar, Matthew Mattina, Paul Whatmough

Specifically, we propose to expose the motion data that is naturally generated by the Image Signal Processor (ISP) early in the vision pipeline to the CNN engine.

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

no code implementations19 Jan 2018 Yuhao Zhu, Matthew Mattina, Paul Whatmough

Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.