Search Results for author: Mihail Tarta

Found 1 papers, 0 papers with code

Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving

no code implementations • 30 Nov 2023 • Ankit Bhardwaj, Amar Phanishayee, Deepak Narayanan, Mihail Tarta, Ryan Stutsman

We present Packrat, a new serving system for online inference that given a model and batch size ($B$) algorithmically picks the optimal number of instances ($i$), the number of threads each should be allocated ($t$), and the batch sizes each should operate on ($b$) that minimizes latency.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.