Search Results for author: Mihail Tarta

Found 1 papers, 0 papers with code

Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving

no code implementations30 Nov 2023 Ankit Bhardwaj, Amar Phanishayee, Deepak Narayanan, Mihail Tarta, Ryan Stutsman

We present Packrat, a new serving system for online inference that given a model and batch size ($B$) algorithmically picks the optimal number of instances ($i$), the number of threads each should be allocated ($t$), and the batch sizes each should operate on ($b$) that minimizes latency.

Cannot find the paper you are looking for? You can Submit a new open access paper.