1 code implementation • 2 Feb 2024 • Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman
Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality.
no code implementations • 30 Oct 2023 • Minghao Yan, Hongyi Wang, Shivaram Venkataraman
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows.
no code implementations • 29 Jan 2022 • Minghao Yan, Nicholas Meisburger, Tharun Medini, Anshumali Shrivastava
We show that with reduced communication, due to sparsity, we can train close to a billion parameter model on simple 4-16 core CPU nodes connected by basic low bandwidth interconnect.
no code implementations • 15 Jun 2021 • Zhaozhuo Xu, Minghao Yan, Junyan Zhang, Anshumali Shrivastava
The dot product self-attention in Transformer allows us to model interactions between words.
1 code implementation • 10 Oct 2019 • Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, R. A. Leo Elworth, Tharun Medini, Todd Treangen, Anshumali Shrivastava
Interestingly, it is a count-min sketch type arrangement of a membership testing utility (Bloom Filter in our case).