2 code implementations • 7 Apr 2024 • Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
For the GPT2 model, the inference latency of the Allo generated accelerator is 1. 7x faster than the NVIDIA A100 GPU with 5. 4x higher energy efficiency, demonstrating the capability of Allo to handle large-scale designs.