no code implementations • 21 Apr 2023 • Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li
It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains.
no code implementations • 24 Jul 2018 • Minjie Wang, Chien-chin Huang, Jinyang Li
This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint.
no code implementations • 10 May 2018 • Minjie Wang, Chien-chin Huang, Jinyang Li
We present this automatic tiling in a new system, SoyBean, that can act as a backend for Tensorflow, MXNet, and others.
no code implementations • 9 Dec 2015 • Jorge Ortiz, Chien-chin Huang, Supriyo Chakraborty
In this paper, we show that by combining the computing power distributed over a number of phones, judicious optimization choices, and contextual information it is possible to execute the end-to-end pipeline entirely on the phones at the edge of the network, efficiently.