no code implementations • 27 Feb 2024 • Yuting Yang, Andrea Merlina, Weijia Song, Tiancheng Yuan, Ken Birman, Roman Vitenberg
We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing.
no code implementations • 30 Nov 2023 • Thiago Garrett, Weijia Song, Roman Vitenberg, Ken Birman
ML inference workflows often require low latency and high throughput, yet we lack good options for addressing this need.
1 code implementation • 29 Nov 2023 • Weijia Song, Thiago Garrett, Yuting Yang, Mingzhao Liu, Edward Tremel, Lorenzo Rosa, Andrea Merlina, Roman Vitenberg, Ken Birman
Interactive intelligent computing applications are increasingly prevalent, creating a need for AI/ML platforms optimized to reduce per-event latency while maintaining high throughput and efficient resource management.