no code implementations • 6 Feb 2024 • Shinan Liu, Ted Shaowang, Gerry Wan, Jeewon Chae, Jonatas Marques, Sanjay Krishnan, Nick Feamster
ServeFlow is able to make inferences on 76. 3% of flows in under 16ms, which is a speed-up of 40. 5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy.