no code implementations • 14 Apr 2024 • Abhishek Tyagi, Reiley Jeyapaul, Chuteng Zhu, Paul Whatmough, Yuhao Zhu
As Neural Processing Units (NPU) or accelerators are increasingly deployed in a variety of applications including safety critical applications such as autonomous vehicle, and medical imaging, it is critical to understand the fault-tolerance nature of the NPUs.
no code implementations • 23 Feb 2024 • Mart van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough
In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality.
1 code implementation • International Conference on Learning Representations 2023 • Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama
Training a hybrid learner is difficult since we lack annotations of hard edge-examples.
no code implementations • 26 Jan 2023 • Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough
The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models.
no code implementations • 5 Dec 2022 • Abhishek Tyagi, Yiming Gan, Shaoshan Liu, Bo Yu, Paul Whatmough, Yuhao Zhu
As Deep Neural Networks (DNNs) are increasingly deployed in safety critical and privacy sensitive applications such as autonomous driving and biometric authentication, it is critical to understand the fault-tolerance nature of DNNs.
1 code implementation • 17 Aug 2022 • Kartikeya Bhardwaj, James Ward, Caleb Tung, Dibakar Gope, Lingchuan Meng, Igor Fedorov, Alex Chalfin, Paul Whatmough, Danny Loh
To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount of non-linearity in models to improve their hardware-awareness and efficiency.
no code implementations • 15 Jan 2022 • Igor Fedorov, Ramon Matas, Hokchhay Tann, Chuteng Zhou, Matthew Mattina, Paul Whatmough
Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity.
1 code implementation • 29 Dec 2021 • Kartikeya Bhardwaj, Dibakar Gope, James Ward, Paul Whatmough, Danny Loh
Autonomous systems are highly vulnerable to a variety of adversarial attacks on Deep Neural Networks (DNNs).
1 code implementation • 29 Sep 2021 • Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama
The first network is a low-capacity network that can be deployed on an edge device, whereas the second is a high-capacity network deployed in the cloud.
no code implementations • 5 Feb 2021 • Srivatsan Krishnan, Zishen Wan, Kshitij Bhardwaj, Paul Whatmough, Aleksandra Faust, Sabrina Neuman, Gu-Yeon Wei, David Brooks, Vijay Janapa Reddi
Balancing a computing system for a UAV requires considering both the cyber (e. g., sensor rate, compute performance) and physical (e. g., payload weight) characteristics that affect overall performance.
1 code implementation • 16 Aug 2020 • Yu Feng, Boyuan Tian, Tiancheng Xu, Paul Whatmough, Yuhao Zhu
Point cloud analytics is poised to become a key workload on battery-powered embedded and mobile platforms in a wide range of emerging application domains, such as autonomous driving, robotics, and augmented reality, where efficiency is paramount.
2 code implementations • 13 Jan 2020 • Paul Whatmough, Marco Donato, Glenn Ko, Sae-Kyu Lee, David Brooks, Gu-Yeon Wei
The current trend for domain-specific architectures (DSAs) has led to renewed interest in research test chips to demonstrate new specialized hardware.
Hardware Architecture
no code implementations • 10 Dec 2019 • Sam Likun Xi, Yuan YAO, Kshitij Bhardwaj, Paul Whatmough, Gu-Yeon Wei, David Brooks
In recent years, there has been tremendous advances in hardware acceleration of deep neural networks.
2 code implementations • 15 Nov 2019 • Yu Feng, Paul Whatmough, Yuhao Zhu
The key to ASV is to exploit unique characteristics inherent to stereo vision, and apply stereo-specific optimizations, both algorithmically and computationally.
no code implementations • 4 Dec 2018 • Paul Whatmough, Chuteng Zhou, Patrick Hansen, Matthew Mattina
On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices.
8 code implementations • 16 Oct 2018 • Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna
Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications.
Distributed, Parallel, and Cluster Computing Hardware Architecture
no code implementations • 29 Mar 2018 • Yuhao Zhu, Anand Samajdar, Matthew Mattina, Paul Whatmough
Specifically, we propose to expose the motion data that is naturally generated by the Image Signal Processor (ISP) early in the vision pipeline to the CNN engine.
no code implementations • 19 Jan 2018 • Yuhao Zhu, Matthew Mattina, Paul Whatmough
Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc.