2 code implementations • 7 Apr 2024 • Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
For the GPT2 model, the inference latency of the Allo generated accelerator is 1. 7x faster than the NVIDIA A100 GPU with 5. 4x higher energy efficiency, demonstrating the capability of Allo to handle large-scale designs.
no code implementations • 23 Dec 2023 • Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang
Experimental results demonstrate our approach can achieve up to 13. 4x speedup when compared to previous FPGA-based accelerators for the BERT model.
no code implementations • CVPR 2022 • Tianchen Zhao, Niansong Zhang, Xuefei Ning, He Wang, Li Yi, Yu Wang
We propose CodedVTR (Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers.
1 code implementation • 25 Nov 2020 • Xuefei Ning, Changcheng Tang, Wenshuo Li, Songyi Yang, Tianchen Zhao, Niansong Zhang, Tianyi Lu, Shuang Liang, Huazhong Yang, Yu Wang
Neural Architecture Search (NAS) has received extensive attention due to its capability to discover neural network architectures in an automated manner.