Search Results for author: Sameer Kumar

Found 7 papers, 1 papers with code

ST-MoE: Designing Stable and Transferable Sparse Expert Models

2 code implementations17 Feb 2022 Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning.

Common Sense Reasoning Coreference Resolution +6

Exploring the limits of Concurrency in ML Training on Google TPUs

no code implementations7 Nov 2020 Sameer Kumar, James Bradbury, Cliff Young, Yu Emma Wang, Anselm Levskaya, Blake Hechtman, Dehao Chen, HyoukJoong Lee, Mehmet Deveci, Naveen Kumar, Pankaj Kanwar, Shibo Wang, Skye Wanderman-Milne, Steve Lacy, Tao Wang, Tayo Oguntebi, Yazhou Zu, Yuanzhong Xu, Andy Swing

Recent results in language understanding using neural networks have required training hardware of unprecedentedscale, with thousands of chips cooperating on a single training run.

Highly Available Data Parallel ML training on Mesh Networks

no code implementations6 Nov 2020 Sameer Kumar, Norm Jouppi

Packets must be routed around the failed chips for full connectivity.

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

no code implementations30 Oct 2020 Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc Le, Yang You, Sameer Kumar

EfficientNets are a family of state-of-the-art image classification models based on efficiently scaled convolutional neural networks.

Image Classification Playing the Game of 2048

Scale MLPerf-0.6 models on Google TPU-v3 Pods

no code implementations21 Sep 2019 Sameer Kumar, Victor Bitorff, Dehao Chen, Chiachen Chou, Blake Hechtman, HyoukJoong Lee, Naveen Kumar, Peter Mattson, Shibo Wang, Tao Wang, Yuanzhong Xu, Zongwei Zhou

The recent submission of Google TPU-v3 Pods to the industry wide MLPerf v0. 6 training benchmark demonstrates the scalability of a suite of industry relevant ML models.

Benchmarking

Image Classification at Supercomputer Scale

no code implementations16 Nov 2018 Chris Ying, Sameer Kumar, Dehao Chen, Tao Wang, Youlong Cheng

Deep learning is extremely computationally intensive, and hardware vendors have responded by building faster accelerators in large clusters.

Classification General Classification +1

PowerAI DDL

no code implementations7 Aug 2017 Minsik Cho, Ulrich Finkler, Sameer Kumar, David Kung, Vaibhav Saxena, Dheeraj Sreedhar

We train Resnet-101 on Imagenet 22K with 64 IBM Power8 S822LC servers (256 GPUs) in about 7 hours to an accuracy of 33. 8 % validation accuracy.

Cannot find the paper you are looking for? You can Submit a new open access paper.