no code implementations • 18 Oct 2024 • Zifei Xu, Sayeh Sharify, Wanzin Yazar, Tristan Webb, Xin Wang
Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low numerical precision.
no code implementations • 15 Oct 2024 • Zifei Xu, Alexander Lan, Wanzin Yazar, Tristan Webb, Sayeh Sharify, Xin Wang
Generalization abilities of well-trained large language models (LLMs) are known to scale predictably as a function of model size.
no code implementations • 12 May 2024 • Sayeh Sharify, Utkarsh Saxena, Zifei Xu, Wanzin Yazar, Ilya Soloveychik, Xin Wang
Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges.
no code implementations • 14 Apr 2024 • Tian Jin, Wanzin Yazar, Zifei Xu, Sayeh Sharify, Xin Wang
We demonstrate that using this custom CUDA kernel improves the throughput of LLM inference by 28%.
no code implementations • 16 Oct 2023 • Tomas M. Bosschieter, Zifei Xu, Hui Lan, Benjamin J. Lengerich, Harsha Nori, Ian Painter, Vivienne Souter, Rich Caruana
The interpretability of the EBM models reveals surprising insights into the features contributing to risk (e. g. maternal height is the second most important feature for shoulder dystocia) and may have potential for clinical application in the prediction and prevention of serious complications in pregnancy.
no code implementations • 12 Jul 2022 • Tomas M. Bosschieter, Zifei Xu, Hui Lan, Benjamin J. Lengerich, Harsha Nori, Kristin Sitcov, Vivienne Souter, Rich Caruana
Most pregnancies and births result in a good outcome, but complications are not uncommon and when they do occur, they can be associated with serious implications for mothers and babies.