Search Results for author: Bowen Yao

Found 1 papers, 1 papers with code

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

1 code implementation2 Mar 2024 Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava

Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations.

16k Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.