Search Results for author: Shaojie Xiang

Found 2 papers, 1 papers with code

Allo: A Programming Model for Composable Accelerator Design

2 code implementations7 Apr 2024 Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang

For the GPT2 model, the inference latency of the Allo generated accelerator is 1. 7x faster than the NVIDIA A100 GPU with 5. 4x higher energy efficiency, demonstrating the capability of Allo to handle large-scale designs.

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference

no code implementations23 Dec 2023 Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang

Experimental results demonstrate our approach can achieve up to 13. 4x speedup when compared to previous FPGA-based accelerators for the BERT model.

Language Modelling Large Language Model

Cannot find the paper you are looking for? You can Submit a new open access paper.