Search Results for author: Zhekai Zhang

Found 5 papers, 2 papers with code

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

no code implementations17 Dec 2020 Hanrui Wang, Zhekai Zhang, Song Han

Inspired by the high redundancy of human languages, we propose the novel cascade token pruning to prune away unimportant tokens in the sentence.

Quantization Sentence

Once for All: Train One Network and Specialize it for Efficient Deployment

1 code implementation ICLR 2020 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable.

Neural Architecture Search

SpArch: Efficient Architecture for Sparse Matrix Multiplication

no code implementations20 Feb 2020 Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally

We then propose a condensed matrix representation that reduces the number of partial matrices by three orders of magnitude and thus reduces DRAM access by 5. 4x.

Hardware Architecture Distributed, Parallel, and Cluster Computing

Once-for-All: Train One Network and Specialize it for Efficient Deployment

10 code implementations26 Aug 2019 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x faster than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission.

Neural Architecture Search

Cannot find the paper you are looking for? You can Submit a new open access paper.