Search Results for author: Basil Hosmer

Found 4 papers, 0 papers with code

Is Flash Attention Stable?

no code implementations • 5 May 2024 • Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads.

Paper
Add Code

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

no code implementations • 25 Apr 2024 • Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs).

Continual Pretraining Semantic Parsing

Paper
Add Code

CHAI: Clustered Head Attention for Efficient LLM Inference

no code implementations • 12 Mar 2024 • Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu

We observe that there is a high amount of redundancy across heads on which tokens they pay attention to.

Paper
Add Code

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

no code implementations • 22 Dec 2023 • Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency.

3D Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.