Search Results for author: Chaitanya Baranwal

Sequence Parallelism: Long Sequence Training from System Perspective

That is, with sparse attention, our sequence parallelism enables us to train transformer with infinite long sequence.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.