Search Results for author: Yuezhou Hu

Accelerating Transformer Pre-Training with 2:4 Sparsity

Training large Transformers is slow, but recent innovations on GPU architecture gives us an advantage.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.