Search Results for author: Zhiye Song

Found 1 papers, 1 papers with code

Striped Attention: Faster Ring Attention for Causal Transformers

1 code implementation • 15 Nov 2023 • William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley

In experiments running Striped Attention on A100 GPUs and TPUv4s, we are able to achieve up to 1. 45x end-to-end throughput improvements over the original Ring Attention algorithm on causal transformer training at a sequence length of 256k.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.