Search Results for author: Michael Andersch

Found 3 papers, 1 papers with code

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

no code implementations ICLR 2018 Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie

Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network.

NMT speech-recognition +1

Reducing Activation Recomputation in Large Transformer Models

3 code implementations10 May 2022 Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

no code implementations17 Apr 2024 Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch

In this work, we conducted a comprehensive analysis on the AlphaFold training procedure based on Openfold, identified that inefficient communications and overhead-dominated computations were the key factors that prevented the AlphaFold training from effective scaling.

Cannot find the paper you are looking for? You can Submit a new open access paper.