Search Results for author: Mohamed S. Abdelfattah

Found 17 papers, 6 papers with code

Encodings for Prediction-based Neural Architecture Search

1 code implementation4 Mar 2024 Yash Akhauri, Mohamed S. Abdelfattah

Building on our study, we present our predictor \textbf{FLAN}: \textbf{Fl}ow \textbf{A}ttention for \textbf{N}AS.

Neural Architecture Search Transfer Learning

On Latency Predictors for Neural Architecture Search

1 code implementation4 Mar 2024 Yash Akhauri, Mohamed S. Abdelfattah

We then design a general latency predictor to comprehensively study (1) the predictor architecture, (2) NN sample selection methods, (3) hardware device representations, and (4) NN operation encoding schemes.

Hardware Aware Neural Architecture Search Meta-Learning +2

Beyond Inference: Performance Analysis of DNN Server Overheads for Computer Vision

no code implementations2 Mar 2024 Ahmed F. AbouElhamayed, Susanne Balle, Deshanand Singh, Mohamed S. Abdelfattah

Our results consistently demonstrate that end-to-end application performance can easily be dominated by data processing and data movement functions (up to 56% of end-to-end latency in a medium-sized image, and $\sim$ 80% impact on system throughput in a large image), even though these functions have been conventionally overlooked in deep learning system design.

Depth Estimation Image Classification

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

no code implementations7 Aug 2023 Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li

With the proposed integer quantization search, we increase the accuracy of ResNet-18 on ImageNet by 1. 31% points and ResNet-50 by 0. 90% points with equivalent model cost over previous methods.

Quantization

DiviML: A Module-based Heuristic for Mapping Neural Networks onto Heterogeneous Platforms

no code implementations31 Jul 2023 Yassine Ghannane, Mohamed S. Abdelfattah

We evaluate our scheduler in optimizing both conventional DNNs and randomly-wired neural networks, subject to latency and throughput constraints, on a heterogeneous system comprised of a CPU and two distinct GPUs.

Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search

no code implementations4 Jun 2023 Yash Akhauri, Mohamed S. Abdelfattah

Many hardware-aware neural architecture search (NAS) methods have been developed to optimize the topology of neural networks (NN) with the joint objectives of higher accuracy and lower latency.

Hardware Aware Neural Architecture Search Meta-Learning +1

PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration

1 code implementation25 May 2023 Ahmed F. AbouElhamayed, Angela Cui, Javier Fernandez-Marques, Nicholas D. Lane, Mohamed S. Abdelfattah

We identify PQ configurations that improve performance-per-area for ResNet20 by up to 3. 1$\times$, even when compared to a highly optimized conventional DNN accelerator, with similar improvements on two additional compact DNNs.

Quantization

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

no code implementations20 Sep 2022 Hongxiang Fan, Thomas Chau, Stylianos I. Venieris, Royson Lee, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed S. Abdelfattah

By jointly optimizing the algorithm and hardware, our FPGA-based butterfly accelerator achieves 14. 2 to 23. 2 times speedup over state-of-the-art accelerators normalized to the same computational budget.

Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

1 code implementation4 Dec 2021 Erwei Wang, James J. Davis, Georgios-Ilias Stavrou, Peter Y. K. Cheung, George A. Constantinides, Mohamed S. Abdelfattah

To address these issues, we propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference.

Efficient Neural Network

Temporal Kernel Consistency for Blind Video Super-Resolution

no code implementations18 Aug 2021 Lichuan Xiang, Royson Lee, Mohamed S. Abdelfattah, Nicholas D. Lane, Hongkai Wen

Deep learning-based blind super-resolution (SR) methods have recently achieved unprecedented performance in upscaling frames with unknown degradation.

Blind Super-Resolution Video Super-Resolution

Zero-Cost Operation Scoring in Differentiable Architecture Search

no code implementations12 Jun 2021 Lichuan Xiang, Łukasz Dudziak, Mohamed S. Abdelfattah, Thomas Chau, Nicholas D. Lane, Hongkai Wen

From this perspective, we introduce a novel \textit{perturbation-based zero-cost operation scoring} (Zero-Cost-PT) approach, which utilizes zero-cost proxies that were recently studied in multi-trial NAS but degrade significantly on larger search spaces, typical for differentiable NAS.

Neural Architecture Search

Zero-Cost Proxies for Lightweight NAS

2 code implementations ICLR 2021 Mohamed S. Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, Nicholas D. Lane

For example, Spearman's rank correlation coefficient between final validation accuracy and our best zero-cost proxy on NAS-Bench-201 is 0. 82, compared to 0. 61 for EcoNAS (a recently proposed reduced-training proxy).

Neural Architecture Search

BRP-NAS: Prediction-based NAS using GCNs

2 code implementations NeurIPS 2020 Łukasz Dudziak, Thomas Chau, Mohamed S. Abdelfattah, Royson Lee, Hyeji Kim, Nicholas D. Lane

What is more, we investigate prediction quality on different metrics and show that sample efficiency of the predictor-based NAS can be improved by considering binary relations of models and an iterative data selection strategy.

Neural Architecture Search

Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator

no code implementations11 Feb 2020 Mohamed S. Abdelfattah, Łukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, Nicholas D. Lane

We automate HW-CNN codesign using NAS by including parameters from both the CNN model and the HW accelerator, and we jointly search for the best model-accelerator pair that boosts accuracy and efficiency.

General Classification Image Classification +2

DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

no code implementations13 Jul 2018 Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane OConnell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C. Ling, Gordon R. Chiu

Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow.

Distributed, Parallel, and Cluster Computing Hardware Architecture Signal Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.