Search Results for author: Ardavan Pedram

Found 6 papers, 2 papers with code

Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators

no code implementations27 Jan 2021 Hamzah Abdel-Aziz, Ali Shafiee, Jong Hoon Shin, Ardavan Pedram, Joseph H. Hassoun

We present novel optimizations based on the above observations to reduce the FP arithmetic hardware overheads.

Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators

no code implementations9 Jan 2020 Noah Gamboa, Kais Kudrolli, Anand Dhoot, Ardavan Pedram

This paper studies structured sparse training of CNNs with a gradual pruning technique that leads to fixed, sparse weight matrices after a set number of epochs.

Starfire: Regularization-Free Adversarially-Robust Structured Sparse Training

no code implementations25 Sep 2019 Noah Gamboa, Kais Kudrolli, Anand Dhoot, Ardavan Pedram

We show that our method creates a sparse version of ResNet50 and ResNet50v1. 5 on full ImageNet while remaining within a negligible <1% margin of accuracy loss.

CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks

no code implementations1 Jun 2017 Yuan-Fang Li, Ardavan Pedram

Our results suggest that smaller networks favor non-batched techniques while performance for larger networks is higher using batched operations.

A Systematic Approach to Blocking Convolutional Neural Networks

1 code implementation14 Jun 2016 Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, Mark Horowitz

Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations.

Blocking

EIE: Efficient Inference Engine on Compressed Deep Neural Network

4 code implementations4 Feb 2016 Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.

Cannot find the paper you are looking for? You can Submit a new open access paper.