Search Results for author: Michael Andersch

Found 2 papers, 1 papers with code

Reducing Activation Recomputation in Large Transformer Models

3 code implementations10 May 2022 Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

no code implementations ICLR 2018 Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie

Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network.

NMT speech-recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.