Search Results for author: William J. Dally

Found 18 papers, 11 papers with code

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

no code implementations13 Jun 2022 Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany

Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT).

Quantization

PatchNet -- Short-range Template Matching for Efficient Video Processing

1 code implementation10 Mar 2021 Huizi Mao, Sibo Zhu, Song Han, William J. Dally

Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition.

Object object-detection +5

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

no code implementations8 Feb 2021 Steve Dai, Rangharajan Venkatesan, Haoxing Ren, Brian Zimmer, William J. Dally, Brucek Khailany

4-bit weights and 8-bit activations achieve near-full-precision accuracy for both BERT-base and BERT-large on SQuAD while reducing area by 26% compared to an 8-bit baseline.

Math Quantization

SpArch: Efficient Architecture for Sparse Matrix Multiplication

no code implementations20 Feb 2020 Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally

We then propose a condensed matrix representation that reduces the number of partial matrices by three orders of magnitude and thus reduces DRAM access by 5. 4x.

Hardware Architecture Distributed, Parallel, and Cluster Computing

Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference

1 code implementation Design Automation Conference (DAC) 2019 Angad S. Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J. Dally, C. Thomas Gray

We also introduce an energy model to predict the requirements of high-accuracy AMS hardware running large networks and use it to show that for ADC-dominated designs, there is a direct tradeoff between energy efficiency and network accuracy.

CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video

no code implementations30 Sep 2018 Huizi Mao, Taeyoung Kong, William J. Dally

Experiments on the KITTI dataset show that CaTDet reduces operation count by 5. 1-8. 7x with the same mean Average Precision(mAP) as the single-model Faster R-CNN detector and incurs additional delay of 0. 3 frame.

object-detection Object Detection

Efficient Sparse-Winograd Convolutional Neural Networks

1 code implementation ICLR 2018 Xingyu Liu, Jeff Pool, Song Han, William J. Dally

First, we move the ReLU operation into the Winograd domain to increase the sparsity of the transformed activations.

Network Pruning

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

3 code implementations ICLR 2018 Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.

Federated Learning Image Classification +3

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

no code implementations24 May 2017 Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally

Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.

Trained Ternary Quantization

4 code implementations4 Dec 2016 Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally

To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values.

Quantization

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

no code implementations1 Dec 2016 Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally

Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.

Quantization speech-recognition +1

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

2 code implementations15 Jul 2016 Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

8k Caption Generation +3

EIE: Efficient Inference Engine on Compressed Deep Neural Network

4 code implementations4 Feb 2016 Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

15 code implementations1 Oct 2015 Song Han, Huizi Mao, William J. Dally

To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

Network Pruning Quantization

Learning both Weights and Connections for Efficient Neural Networks

7 code implementations NeurIPS 2015 Song Han, Jeff Pool, John Tran, William J. Dally

On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6. 7 million, without incurring accuracy loss.

Cannot find the paper you are looking for? You can Submit a new open access paper.