Search Results for author: Aniruddha Nrusimha

Found 6 papers, 4 papers with code

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

1 code implementation4 Apr 2024 Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim

We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult.

Language Modelling Quantization

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

no code implementations7 Feb 2024 Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

In this work, we propose Hydra heads, a sequentially dependent, drop-in replacement for standard draft heads that significantly improves speculation accuracy.

Striped Attention: Faster Ring Attention for Causal Transformers

1 code implementation15 Nov 2023 William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley

In experiments running Striped Attention on A100 GPUs and TPUv4s, we are able to achieve up to 1. 45x end-to-end throughput improvements over the original Ring Attention algorithm on causal transformer training at a sequence length of 256k.

Towards Verifiable Text Generation with Symbolic References

no code implementations15 Nov 2023 Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim

LLMs are vulnerable to hallucinations, and thus their outputs generally require laborious human verification for high-stakes applications.

Question Answering Text Generation

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

2 code implementations7 Oct 2019 Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.

Cannot find the paper you are looking for? You can Submit a new open access paper.