Search Results for author: Zhanpeng Zeng

Found 6 papers, 4 papers with code

LookupFFN: Making Transformers Compute-lite for CPU inference

1 code implementation • 12 Mar 2024 • Zhanpeng Zeng, Michael Davies, Pranav Pulijala, Karthikeyan Sankaralingam, Vikas Singh

While GPU clusters are the de facto choice for training large deep neural network (DNN) models today, several reasons including ease of workflow, security and cost have led to efforts investigating whether CPUs may be viable for inference in routine use in many sectors of the industry.

Language Modelling

Paper
Code

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

no code implementations • 12 Mar 2024 • Zhanpeng Zeng, Karthikeyan Sankaralingam, Vikas Singh

A popular strategy is the use of low bit-width integers to approximate the original entries in a matrix.

Paper
Add Code

FrameQuant: Flexible Low-Bit Quantization for Transformers

no code implementations • 10 Mar 2024 • Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang, Vikas Singh

If quantization is interpreted as the addition of noise, our casting of the problem allows invoking an extensive body of known consistent recovery and noise robustness guarantees.

Quantization

Paper
Add Code

Multi Resolution Analysis (MRA) for Approximate Self-Attention

1 code implementation • 21 Jul 2022 • Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh

Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision.

Paper
Code

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

1 code implementation • 18 Nov 2021 • Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh

In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hashing (LSH), decreases the quadratic complexity of such models to linear.

Paper
Code

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

6 code implementations • 7 Feb 2021 • Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh

The scalability of Nystr\"{o}mformer enables application to longer sequences with thousands of tokens.

Ranked #13 on Semantic Textual Similarity on MRPC (F1 metric)

Natural Language Inference Question Answering +2

7,552

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.