Search Results for author: James Demmel

Found 15 papers, 8 papers with code

Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping

1 code implementation • 24 Jun 2023 • Daniel Zou, Xinchen Jin, Xueyang Yu, Hao Zhang, James Demmel

In anticipation of workloads that involve serving many of such large models to handle different tasks, we develop Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster.

Paper
Code

Distributed-Memory Sparse Kernels for Machine Learning

1 code implementation • 15 Mar 2022 • Vivek Bharadwaj, Aydın Buluç, James Demmel

Further, we give two communication-eliding strategies to reduce costs further for FusedMM kernels: either reusing the replication of an input dense matrix for the SDDMM and SpMM in sequence, or fusing the local SDDMM and SpMM kernels.

BIG-bench Machine Learning Collaborative Filtering +1

Paper
Code

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

no code implementations • 5 May 2021 • Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many of which feature a large number of processing elements laid out spatially, together with a multi-level memory hierarchy and flexible interconnect.

Navigate Scheduling

Paper
Add Code

Avoiding Communication in Logistic Regression

no code implementations • 16 Nov 2020 • Aditya Devarakonda, James Demmel

Stochastic gradient descent (SGD) is one of the most widely used optimization methods for solving various machine learning problems.

regression

Paper
Add Code

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

no code implementations • 30 Oct 2020 • Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc Le, Yang You, Sameer Kumar

EfficientNets are a family of state-of-the-art image classification models based on efficiently scaled convolutional neural networks.

Image Classification Playing the Game of 2048

Paper
Add Code

The Limit of the Batch Size

no code implementations • 15 Jun 2020 • Yang You, Yuhui Wang, huan zhang, Zhao Zhang, James Demmel, Cho-Jui Hsieh

For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting.

Paper
Add Code

Auto-Precision Scaling for Distributed Deep Learning

1 code implementation • 20 Nov 2019 • Ruobing Han, James Demmel, Yang You

Our experimental results show that for many applications, APS can train state-of-the-art models by 8-bit gradients with no or only a tiny accuracy loss (<0. 05%).

Image Classification

Paper
Code

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

24 code implementations • ICLR 2020 • Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

Ranked #11 on Question Answering on SQuAD1.1 dev (F1 metric)

Question Answering Stochastic Optimization

1,847

Paper
Code

Large-Batch Training for LSTM and Beyond

1 code implementation • 24 Jan 2019 • Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

LEGW enables Sqrt Scaling scheme to be useful in practice and as a result we achieve much better results than the Linear Scaling learning rate scheme.

Paper
Code

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization

no code implementations • 17 Dec 2017 • Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney

Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML).

Paper
Add Code

Avoiding Communication in Proximal Methods for Convex Optimization Problems

no code implementations • 24 Oct 2017 • Saeed Soori, Aditya Devarakonda, James Demmel, Mert Gurbuzbalaban, Maryam Mehri Dehnavi

We formulate the algorithm for two different optimization methods on the Lasso problem and show that the latency cost is reduced by a factor of k while bandwidth and floating-point operation costs remain the same.

Paper
Add Code

ImageNet Training in Minutes

1 code implementation • 14 Sep 2017 • Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, Kurt Keutzer

If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute.

16k Playing the Game of 2048

Paper
Code

Asynchronous Parallel Greedy Coordinate Descent

no code implementations • NeurIPS 2016 • Yang You, Xiangru Lian, Ji Liu, Hsiang-Fu Yu, Inderjit S. Dhillon, James Demmel, Cho-Jui Hsieh

n this paper, we propose and study an Asynchronous parallel Greedy Coordinate Descent (Asy-GCD) algorithm for minimizing a smooth function with bounded constraints.

Paper
Add Code

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

1 code implementation • 5 Jul 2016 • Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jiyan Yang, James Demmel, Jim Harrell, Venkat Krishnamurthy, Michael W. Mahoney, Prabhat

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms.

Distributed, Parallel, and Cluster Computing G.1.3; C.2.4

Paper
Code

Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication

1 code implementation • 14 Feb 2012 • Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, Oded Schwartz

We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication.

Data Structures and Algorithms Computational Complexity Distributed, Parallel, and Cluster Computing Numerical Analysis Combinatorics Numerical Analysis 68W40, 68W10 F.2.1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.