Search Results for author: K. V. Rashmi

Found 5 papers, 3 papers with code

Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs

1 code implementation19 Apr 2021 Jack Kosaian, K. V. Rashmi

Algorithm-based fault tolerance (ABFT) is emerging as an efficient approach for fault tolerance in NNs.

ECRM: Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding

no code implementations5 Apr 2021 Kaige Liu, Jack Kosaian, K. V. Rashmi

We present ECRM, a DLRM training system that achieves efficient fault tolerance using erasure coding.

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

no code implementations2 May 2019 Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

In order to scale to high query rates, prediction serving systems are run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency and cause violations of strict latency targets.

BIG-bench Machine Learning Image Classification +3

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation

3 code implementations4 Jun 2018 Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman

To the best of our knowledge, this work proposes the first learning-based approach for designing codes, and also presents the first coding-theoretic solution that can provide resilience for any non-linear (differentiable) computation.

BIG-bench Machine Learning

DART: Dropouts meet Multiple Additive Regression Trees

1 code implementation7 May 2015 K. V. Rashmi, Ran Gilad-Bachrach

Multiple Additive Regression Trees (MART), an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and it is widely used in practice.

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.