Search Results for author: William J. Dally

Found 18 papers, 11 papers with code

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

no code implementations • 13 Jun 2022 • Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany

Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT).

Quantization

Paper
Add Code

PatchNet -- Short-range Template Matching for Efficient Video Processing

1 code implementation • 10 Mar 2021 • Huizi Mao, Sibo Zhu, Song Han, William J. Dally

Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition.

Object object-detection +5

Paper
Code

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

no code implementations • 8 Feb 2021 • Steve Dai, Rangharajan Venkatesan, Haoxing Ren, Brian Zimmer, William J. Dally, Brucek Khailany

4-bit weights and 8-bit activations achieve near-full-precision accuracy for both BERT-base and BERT-large on SQuAD while reducing area by 26% compared to an 8-bit baseline.

Math Quantization

Paper
Add Code

SpArch: Efficient Architecture for Sparse Matrix Multiplication

no code implementations • 20 Feb 2020 • Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally

We then propose a condensed matrix representation that reduces the number of partial matrices by three orders of magnitude and thus reduces DRAM access by 5. 4x.

Hardware Architecture Distributed, Parallel, and Cluster Computing

Paper
Add Code

A Delay Metric for Video Object Detection: What Average Precision Fails to Tell

1 code implementation • ICCV 2019 • Huizi Mao, Xiaodong Yang, William J. Dally

Average precision (AP) is a widely used metric to evaluate detection accuracy of image and video object detectors.

Object object-detection +1

Paper
Code

Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference

1 code implementation • Design Automation Conference (DAC) 2019 • Angad S. Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J. Dally, C. Thomas Gray

We also introduce an energy model to predict the requirements of high-accuracy AMS hardware running large networks and use it to show that for ADC-dominated designs, there is a direct tradeoff between energy efficiency and network accuracy.

Paper
Code

CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video

no code implementations • 30 Sep 2018 • Huizi Mao, Taeyoung Kong, William J. Dally

Experiments on the KITTI dataset show that CaTDet reduces operation count by 5. 1-8. 7x with the same mean Average Precision(mAP) as the single-model Faster R-CNN detector and incurs additional delay of 0. 3 frame.

object-detection Object Detection

Paper
Add Code

Efficient Sparse-Winograd Convolutional Neural Networks

1 code implementation • ICLR 2018 • Xingyu Liu, Jeff Pool, Song Han, William J. Dally

First, we move the ReLU operation into the Winograd domain to increase the sparsity of the transformed activations.

Network Pruning

187

Paper
Code

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

3 code implementations • ICLR 2018 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.

Federated Learning Image Classification +3

244

Paper
Code

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

no code implementations • 24 May 2017 • Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally

Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.

Paper
Add Code

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

no code implementations • 23 May 2017 • Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally

Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning.

Autonomous Vehicles Network Pruning

Paper
Add Code

Trained Ternary Quantization

4 code implementations • 4 Dec 2016 • Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally

To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values.

Quantization

6,298

Paper
Code

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

no code implementations • 1 Dec 2016 • Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally

Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.

Quantization speech-recognition +1

Paper
Add Code

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

2 code implementations • 15 Jul 2016 • Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

8k Caption Generation +3

Paper
Code

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

58 code implementations • 24 Feb 2016 • Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer

(2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car.

Ranked #1 on Image Classification on ImageNet-P

Image Classification Model Compression +1

15,445

Paper
Code

EIE: Efficient Inference Engine on Compressed Deep Neural Network

4 code implementations • 4 Feb 2016 • Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.

644

Paper
Code

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

15 code implementations • 1 Oct 2015 • Song Han, Huizi Mao, William J. Dally

To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

Network Pruning Quantization

4,305

Paper
Code

Learning both Weights and Connections for Efficient Neural Networks

7 code implementations • NeurIPS 2015 • Song Han, Jeff Pool, John Tran, William J. Dally

On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6. 7 million, without incurring accuracy loss.

644

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.