no code implementations • 22 Jul 2024 • Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Viswanath Ganapathy, Rafael Esteves, Shreya Kadambi, Shubhankar Borse, Paul Whatmough, Risheek Garrepalli, Mart van Baalen, Harris Teague, Markus Nagel
In this paper, we propose Sparse High Rank Adapters (SHiRA) that directly finetune 1-2% of the base model weights while leaving others unchanged, thus, resulting in a highly sparse adapter.
no code implementations • 19 Jun 2024 • Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Viswanath Ganapathy, Rafael Esteves, Shreya Kadambi, Shubhankar Borse, Paul Whatmough, Risheek Garrepalli, Mart van Baalen, Harris Teague, Markus Nagel
However, from a mobile deployment standpoint, we can either avoid inference overhead in the fused mode but lose the ability to switch adapters rapidly, or suffer significant (up to 30% higher) inference latency while enabling rapid switching in the unfused mode.
no code implementations • 26 Mar 2024 • Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Kyunggeun Lee, Jun Ma, Harris Teague
Large generative models such as large language models (LLMs) and diffusion models have revolutionized the fields of NLP and computer vision respectively.
1 code implementation • 27 Apr 2023 • Burak Bartan, Haoming Li, Harris Teague, Christopher Lott, Bistra Dilkina
The deployment and training of neural networks on edge computing devices pose many challenges.
no code implementations • 13 Jul 2022 • Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan
Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance.
1 code implementation • 19 Jun 2019 • Hsin-Pai Cheng, Tunhou Zhang, Yukun Yang, Feng Yan, Shi-Yu Li, Harris Teague, Hai Li, Yiran Chen
Designing neural architectures for edge devices is subject to constraints of accuracy, inference latency, and computational cost.