Search Results for author: Bowen Yao

Found 1 papers, 1 papers with code

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

1 code implementation • 2 Mar 2024 • Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava

Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations.

16k Language Modelling +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.