2 code implementations • 10 May 2022 • Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro
In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.
no code implementations • ICLR 2018 • Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie
Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network.