Search Results for author: Aniruddha Nrusimha

Found 6 papers, 4 papers with code

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

1 code implementation • 4 Apr 2024 • Aniruddha Nrusimha, Mayank Mishra, Naigang Wang, Dan Alistarh, Rameswar Panda, Yoon Kim

We show that regularizing both the inputs and outputs is crucial for preventing a model's "migrating" the difficulty in input quantization to the weights, which makes post-training quantization (PTQ) of weights more difficult.

Language Modelling Quantization

Paper
Code

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

no code implementations • 7 Feb 2024 • Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

In this work, we propose Hydra heads, a sequentially dependent, drop-in replacement for standard draft heads that significantly improves speculation accuracy.

Paper
Add Code

Striped Attention: Faster Ring Attention for Causal Transformers

1 code implementation • 15 Nov 2023 • William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley

In experiments running Striped Attention on A100 GPUs and TPUv4s, we are able to achieve up to 1. 45x end-to-end throughput improvements over the original Ring Attention algorithm on causal transformer training at a sequence length of 256k.

Paper
Code

Towards Verifiable Text Generation with Symbolic References

no code implementations • 15 Nov 2023 • Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim

LLMs are vulnerable to hallucinations, and thus their outputs generally require laborious human verification for high-stakes applications.

Question Answering Text Generation

Paper
Add Code

Integer-only Zero-shot Quantization for Efficient Speech Recognition

1 code implementation • 31 Mar 2021 • Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer

End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

2 code implementations • 7 Oct 2019 • Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.

124

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.