no code implementations • 27 Dec 2023 • Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov
Serving models under such conditions requires these systems to strike a careful balance between the latency and accuracy requirements of the application and the overall efficiency of utilization of scarce resources.
no code implementations • 24 Oct 2023 • Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov
To address this shortcoming, we propose a novel KD approach to GNN compression that we call Attention-Based Knowledge Distillation (ABKD).
no code implementations • 20 Jul 2023 • Hugo Latapie, Shan Yu, Patrick Hammer, Kristinn R. Thorisson, Vahagn Petrosyan, Brandon Kynoch, Alind Khare, Payman Behnam, Alexey Tumanov, Aksheit Saxena, Anish Aralikatti, Hanning Chen, Mohsen Imani, Mike Archbold, Tangrui Li, Pei Wang, Justin Hart
Traditional computer vision models often necessitate extensive data acquisition, annotation, and validation.
no code implementations • 21 Jun 2023 • Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey Tumanov
For the stream of queries, SUSHI yields up to 25% improvement in latency, 0. 98% increase in served accuracy.
no code implementations • 26 Jan 2023 • Alind Khare, Animesh Agrawal, Myungjin Lee, Alexey Tumanov
We propose SuperFed - an architectural framework that incurs $O(1)$ cost to co-train a large family of models in a federated fashion by leveraging weight-shared learning.
no code implementations • 26 Oct 2022 • Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishikesan Kamaleswaran, Chao Zhang, Alexey Tumanov
It achieves within 0. 1% accuracy from the highest-performing multi-class baseline, while saving close to 20X on spatio-temporal cost of inference and earlier (3. 5hrs) disease onset prediction.
1 code implementation • 26 Apr 2021 • Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov
The emergence of CNNs in mainstream deployment has necessitated methods to design and train efficient architectures tailored to maximize the accuracy under diverse hardware & latency constraints.
2 code implementations • ICLR 2021 • Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov
The emergence of CNNs in mainstream deployment has necessitated methods to design and train efficient architectures tailored to maximize the accuracy under diverse hardware & latency constrains.
3 code implementations • 10 Aug 2020 • Shenda Hong, Yanbo Xu, Alind Khare, Satria Priambada, Kevin Maher, Alaa Aljiffry, Jimeng Sun, Alexey Tumanov
HOLMES is tested on risk prediction task on pediatric cardio ICU data with above 95% prediction accuracy and sub-second latency on 64-bed simulation.
no code implementations • 25 Oct 2019 • Koyel Mukherjee, Alind Khare, Ashish Verma
Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime.