no code implementations • 4 Nov 2020 • Michael Lui, Yavuz Yetim, Özgür Özkan, Zhuoran Zhao, Shin-Yeh Tsai, Carole-Jean Wu, Mark Hempstead
One approach to support this scale is with distributed serving, or distributed inference, which divides the memory requirements of a single large model across multiple servers.