Search Results for author: Minghao Yan

Found 5 papers, 2 papers with code

Decoding Speculative Decoding

1 code implementation • 2 Feb 2024 • Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality.

Language Modelling

Paper
Code

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices

no code implementations • 30 Oct 2023 • Minghao Yan, Hongyi Wang, Shivaram Venkataraman

As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows.

Bayesian Optimization Efficient Neural Network

Paper
Add Code

Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity

no code implementations • 29 Jan 2022 • Minghao Yan, Nicholas Meisburger, Tharun Medini, Anshumali Shrivastava

We show that with reduced communication, due to sparsity, we can train close to a billion parameter model on simple 4-16 core CPU nodes connected by basic low bandwidth interconnect.

Cloud Computing

Paper
Add Code

PairConnect: A Compute-Efficient MLP Alternative to Attention

no code implementations • 15 Jun 2021 • Zhaozhuo Xu, Minghao Yan, Junyan Zhang, Anshumali Shrivastava

The dot product self-attention in Transformer allows us to model interactions between words.

Language Modelling Word Embeddings

Paper
Add Code

Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)

1 code implementation • 10 Oct 2019 • Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, R. A. Leo Elworth, Tharun Medini, Todd Treangen, Anshumali Shrivastava

Interestingly, it is a count-min sketch type arrangement of a membership testing utility (Bloom Filter in our case).

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.