3 code implementations • 10 May 2022 • Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro
In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.
no code implementations • 27 Apr 2020 • Sangkug Lym, Mattan Erez
Based on our evaluation, FlexSA with the proposed compilation heuristic improves compute resource utilization of pruning and training modern CNN models by 37% compared to a conventional training accelerator with a large systolic array.
no code implementations • 2 Apr 2019 • Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, Mattan Erez
Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth.
1 code implementation • 26 Jan 2019 • Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez
State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights.
1 code implementation • 30 Sep 2018 • Sangkug Lym, Armand Behroozi, Wei Wen, Ge Li, Yongkee Kwon, Mattan Erez
Training convolutional neural networks (CNNs) requires intense computations and high memory bandwidth.