no code implementations • 12 Mar 2024 • Saurabh Agarwal, Bilge Acun, Basil Homer, Mostafa Elhoushi, Yejin Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu
We observe that there is a high amount of redundancy across heads on which tokens they pay attention to.
1 code implementation • 7 Mar 2024 • Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung
We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task.
Ranked #1 on Code Completion on SAFIM
1 code implementation • 5 Jan 2024 • Linyuan Gong, Mostafa Elhoushi, Alvin Cheung
Large language models (LLMs) have made significant advancements in code-related tasks, yet many LLMs treat code as simple sequences, neglecting its structured nature.
no code implementations • 5 Dec 2023 • Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani
Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset.
1 code implementation • 1 Dec 2023 • Jiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko
Minuet proposes to (i) replace the hash tables used in the Map step with a novel segmented sorting double-traversed binary search algorithm that highly utilizes the on-chip memory hierarchy of GPUs, (ii) use a lightweight scheme to autotune the tile size in the Gather and Scatter operations of the GMaS step, such that to adapt the execution to the particular characteristics of each SC layer, dataset, and GPU architecture, and (iii) employ a padding-efficient GEMM grouping approach that reduces both memory padding and kernel launching overheads.
1 code implementation • 3 Oct 2023 • Anas Mahmoud, Mostafa Elhoushi, Amro Abbas, Yu Yang, Newsha Ardalani, Hugh Leather, Ari Morcos
We propose a pruning signal, Sieve, that employs synthetic captions generated by image-captioning models pretrained on small, diverse, and well-aligned image-text pairs to evaluate the alignment of noisy image-text pairs.
no code implementations • 11 Sep 2023 • Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, Hugh Leather
We explore the novel application of Large Language Models to code optimization.
no code implementations • 9 Jan 2023 • Youwei Liang, Kevin Stone, Ali Shameli, Chris Cummins, Mostafa Elhoushi, Jiadong Guo, Benoit Steiner, Xiaomeng Yang, Pengtao Xie, Hugh Leather, Yuandong Tian
Finding the optimal pass sequence of compilation can lead to a significant reduction in program size and/or improvement in program efficiency.
1 code implementation • 24 Oct 2022 • Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty
We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks.
no code implementations • 18 Jul 2022 • Amir H. Ashouri, Mostafa Elhoushi, Yuzhe Hua, Xiang Wang, Muhammad Asif Manzoor, Bryan Chan, Yaoqing Gao
This paper presents MLGOPerf; the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner.
1 code implementation • CVPR 2022 • Sara Elkerdawy, Mostafa Elhoushi, Hong Zhang, Nilanjan Ray
On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction.
1 code implementation • 11 Jul 2020 • Sara Elkerdawy, Mostafa Elhoushi, Abhineet Singh, Hong Zhang, Nilanjan Ray
LayerPrune presents a set of layer pruning methods based on different criteria that achieve higher latency reduction than filter pruning methods on similar accuracy.
1 code implementation • 10 Sep 2019 • Mostafa Elhoushi, Ye Henry Tian, Zihao Chen, Farhan Shafiq, Joey Yiwei Li
In our approach, we train the model from scratch (i. e., randomly initialized weights) with its original architecture for a small number of epochs, then the model is decomposed, and then continue training the decomposed model till the end.
1 code implementation • 30 May 2019 • Mostafa Elhoushi, Zihao Chen, Farhan Shafiq, Ye Henry Tian, Joey Yiwei Li
This family of neural network architectures (that use convolutional shifts and fully connected shifts) is referred to as DeepShift models.