Search Results for author: Zachary Ankner

Found 5 papers, 1 papers with code

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

no code implementations7 Feb 2024 Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

In this work, we propose Hydra heads, a sequentially dependent, drop-in replacement for standard draft heads that significantly improves speculation accuracy.

Striped Attention: Faster Ring Attention for Causal Transformers

1 code implementation15 Nov 2023 William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley

In experiments running Striped Attention on A100 GPUs and TPUv4s, we are able to achieve up to 1. 45x end-to-end throughput improvements over the original Ring Attention algorithm on causal transformer training at a sequence length of 256k.

Dynamic Masking Rate Schedules for MLM Pretraining

no code implementations24 May 2023 Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%.

Language Modelling Masked Language Modeling +1

The Effect of Data Dimensionality on Neural Network Prunability

no code implementations1 Dec 2022 Zachary Ankner, Alex Renda, Gintare Karolina Dziugaite, Jonathan Frankle, Tian Jin

Practitioners prune neural networks for efficiency gains and generalization improvements, but few scrutinize the factors determining the prunability of a neural network the maximum fraction of weights that pruning can remove without compromising the model's test accuracy.

Cannot find the paper you are looking for? You can Submit a new open access paper.