no code implementations • 31 Jan 2024 • Dingyi Dai, Yichi Zhang, Jiahao Zhang, Zhanqiu Hu, Yaohui Cai, Qi Sun, Zhiru Zhang
Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs.
no code implementations • 23 Dec 2023 • Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang
Experimental results demonstrate our approach can achieve up to 13. 4x speedup when compared to previous FPGA-based accelerators for the BERT model.
1 code implementation • NeurIPS 2023 • Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa
This work studies post-training parameter quantization in large language models (LLMs).
no code implementations • 4 Mar 2022 • Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang
In addition, since PreCropping compresses CNNs at initialization, the computational and memory costs of CNNs are reduced for both training and inference on commodity hardware.
2 code implementations • 7 Feb 2021 • Wuxinlin Cheng, Chenhui Deng, Zhiqiang Zhao, Yaohui Cai, Zhiru Zhang, Zhuo Feng
A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model.
3 code implementations • 12 Jun 2020 • Zhen Dong, Dequan Wang, Qijing Huang, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek
Deploying deep learning models on embedded systems has been challenging due to limited computing resources.
2 code implementations • 19 Feb 2020 • Qijing Huang, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen Wu, Kurt Keutzer, John Wawrzynek
In this work, we first investigate the overhead of the deformable convolution on embedded FPGA SoCs, and then show the accuracy-latency tradeoffs for a set of algorithm modifications including full versus depthwise, fixed-shape, and limited-range.
3 code implementations • CVPR 2020 • Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0. 5\% of one epoch training time of ResNet50 on ImageNet).
Ranked #1 on Data Free Quantization on CIFAR10 (CIFAR-10 W8A8 Top-1 Accuracy metric)
2 code implementations • NeurIPS 2020 • Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
However, the search space for a mixed-precision quantization is exponential in the number of layers.