Search Results for author: Yaohui Cai

Found 9 papers, 6 papers with code

Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs

no code implementations31 Jan 2024 Dingyi Dai, Yichi Zhang, Jiahao Zhang, Zhanqiu Hu, Yaohui Cai, Qi Sun, Zhiru Zhang

Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs.

Quantization

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference

no code implementations23 Dec 2023 Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang

Experimental results demonstrate our approach can achieve up to 13. 4x speedup when compared to previous FPGA-based accelerators for the BERT model.

Language Modelling Large Language Model

Structured Pruning is All You Need for Pruning CNNs at Initialization

no code implementations4 Mar 2022 Yaohui Cai, Weizhe Hua, Hongzheng Chen, G. Edward Suh, Christopher De Sa, Zhiru Zhang

In addition, since PreCropping compresses CNNs at initialization, the computational and memory costs of CNNs are reduced for both training and inference on commodity hardware.

Model Compression

SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation

2 code implementations7 Feb 2021 Wuxinlin Cheng, Chenhui Deng, Zhiqiang Zhao, Yaohui Cai, Zhiru Zhang, Zhuo Feng

A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model.

Adversarial Robustness Graph Embedding

Algorithm-hardware Co-design for Deformable Convolution

2 code implementations19 Feb 2020 Qijing Huang, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen Wu, Kurt Keutzer, John Wawrzynek

In this work, we first investigate the overhead of the deformable convolution on embedded FPGA SoCs, and then show the accuracy-latency tradeoffs for a set of algorithm modifications including full versus depthwise, fixed-shape, and limited-range.

Image Classification Instance Segmentation +4

ZeroQ: A Novel Zero Shot Quantization Framework

3 code implementations CVPR 2020 Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0. 5\% of one epoch training time of ResNet50 on ImageNet).

 Ranked #1 on Data Free Quantization on CIFAR10 (CIFAR-10 W8A8 Top-1 Accuracy metric)

Data Free Quantization Neural Network Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.