no code implementations • 2 Feb 2024 • Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman
However, our experiments indicate the contrary with throughput diminishing as the probability of generated tokens to be accepted by the target model increases.
no code implementations • 30 Oct 2023 • Minghao Yan, Hongyi Wang, Shivaram Venkataraman
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows.
no code implementations • 29 Jan 2022 • Minghao Yan, Nicholas Meisburger, Tharun Medini, Anshumali Shrivastava
We show that with reduced communication, due to sparsity, we can train close to a billion parameter model on simple 4-16 core CPU nodes connected by basic low bandwidth interconnect.
no code implementations • 15 Jun 2021 • Zhaozhuo Xu, Minghao Yan, Junyan Zhang, Anshumali Shrivastava
The dot product self-attention in Transformer allows us to model interactions between words.
1 code implementation • 10 Oct 2019 • Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, R. A. Leo Elworth, Tharun Medini, Todd Treangen, Anshumali Shrivastava
Interestingly, it is a count-min sketch type arrangement of a membership testing utility (Bloom Filter in our case).