Search Results for author: Sophia Shao

Found 2 papers, 1 papers with code

SPEED: Speculative Pipelined Execution for Efficient Decoding

no code implementations18 Oct 2023 Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao

For Transformer decoders that employ parameter sharing, the memory operations for the tokens executing in parallel can be amortized, which allows us to accelerate generative LLM inference.

NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning

1 code implementation20 Sep 2019 Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica

However, these models are unable to capture the data dependency, the computation graph, or the organization of instructions.

Distributed, Parallel, and Cluster Computing Performance Programming Languages

Cannot find the paper you are looking for? You can Submit a new open access paper.