Search Results for author: Shulin Zeng

Found 5 papers, 0 papers with code

A Survey of FPGA Based Neural Network Accelerator

no code implementations • 24 Dec 2017 • Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, Huazhong Yang

Various FPGA based accelerator designs have been proposed with software and hardware optimization techniques to achieve high speed and energy efficiency.

Hardware Architecture

Paper
Add Code

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud

no code implementations • 26 Mar 2020 • Shulin Zeng, Guohao Dai, Hanbo Sun, Kai Zhong, Guangjun Ge, Kaiyuan Guo, Yu Wang, Huazhong Yang

Currently, the majority of FPGA-based DNN accelerators in the cloud run in a time-division multiplexing way for multiple users sharing a single FPGA, and require re-compilation with $\sim$100 s overhead.

Paper
Add Code

Exploring the Potential of Low-bit Training of Convolutional Neural Networks

no code implementations • 4 Jun 2020 • Kai Zhong, Xuefei Ning, Guohao Dai, Zhenhua Zhu, Tianchen Zhao, Shulin Zeng, Yu Wang, Huazhong Yang

For training a variety of models on CIFAR-10, using 1-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within $1\%$.

Quantization

Paper
Add Code

Explore the Potential of CNN Low Bit Training

no code implementations • 1 Jan 2021 • Kai Zhong, Xuefei Ning, Tianchen Zhao, Zhenhua Zhu, Shulin Zeng, Guohao Dai, Yu Wang, Huazhong Yang

Through this dynamic precision framework, we can reduce the bit-width of convolution, which is the most computational cost, while keeping the training process close to the full precision floating-point training.

Quantization

Paper
Add Code

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

no code implementations • 8 Jan 2024 • Shulin Zeng, Jun Liu, Guohao Dai, Xinhao Yang, Tianyu Fu, Hongyi Wang, Wenheng Ma, Hanbo Sun, Shiyao Li, Zixiao Huang, Yadong Dai, Jintao Li, Zehao Wang, Ruoyu Zhang, Kairui Wen, Xuefei Ning, Yu Wang

However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads.

Computational Efficiency Language Modelling +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.