Search Results for author: Chien-chin Huang

Found 5 papers, 2 papers with code

TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

3 code implementations9 Oct 2024 Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos

By stacking training optimizations, we demonstrate accelerations of 65. 08% with 1D parallelism at the 128-GPU scale (Llama 3. 1 8B), an additional 12. 59% with 2D parallelism at the 256-GPU scale (Llama 3. 1 70B), and an additional 30% with 3D parallelism at the 512-GPU scale (Llama 3. 1 405B) on NVIDIA H100 GPUs over optimized baselines.

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

no code implementations24 Jul 2018 Minjie Wang, Chien-chin Huang, Jinyang Li

This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint.

graph partitioning

Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

no code implementations10 May 2018 Minjie Wang, Chien-chin Huang, Jinyang Li

We present this automatic tiling in a new system, SoyBean, that can act as a backend for Tensorflow, MXNet, and others.

Get More With Less: Near Real-Time Image Clustering on Mobile Phones

no code implementations9 Dec 2015 Jorge Ortiz, Chien-chin Huang, Supriyo Chakraborty

In this paper, we show that by combining the computing power distributed over a number of phones, judicious optimization choices, and contextual information it is possible to execute the end-to-end pipeline entirely on the phones at the edge of the network, efficiently.

Clustering Image Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.