Search Results for author: K. V. Rashmi

Found 5 papers, 3 papers with code

Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs

1 code implementation • 19 Apr 2021 • Jack Kosaian, K. V. Rashmi

Algorithm-based fault tolerance (ABFT) is emerging as an efficient approach for fault tolerance in NNs.

Paper
Code

ECRM: Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding

no code implementations • 5 Apr 2021 • Kaige Liu, Jack Kosaian, K. V. Rashmi

We present ECRM, a DLRM training system that achieves efficient fault tolerance using erasure coding.

Paper
Add Code

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

no code implementations • 2 May 2019 • Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency and cause violations of strict latency targets.

BIG-bench Machine Learning Image Classification +3

Paper
Add Code

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

3 code implementations • 4 Jun 2018 • Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

To the best of our knowledge, this work proposes the first learning-based approach for designing codes, and also presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation.

BIG-bench Machine Learning

Paper
Code

DART: Dropouts meet Multiple Additive Regression Trees

1 code implementation • 7 May 2015 • K. V. Rashmi, Ran Gilad-Bachrach

Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice.

regression

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.