no code implementations • 29 Mar 2024 • Jovan Stojkovic, Esha Choukse, Chaojie Zhang, Inigo Goiri, Josep Torrellas
Given the high compute and memory requirements of modern LLMs, more and more top-of-the-line GPUs are being deployed to serve these models.