Queueing Analysis of GPU-Based Inference Servers with Dynamic Batching: A Closed-Form Characterization

An important characteristic of GPU-based inference is that the computational efficiency, in terms of the processing speed and energy consumption, drastically increases by processing multiple jobs together in a batch.

