Search Results for author: Mostafa Elhoushi

Found 14 papers, 9 papers with code

Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks

1 code implementation7 Mar 2024 Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung

We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task.

Code Completion

AST-T5: Structure-Aware Pretraining for Code Generation and Understanding

1 code implementation5 Jan 2024 Linyuan Gong, Mostafa Elhoushi, Alvin Cheung

Large language models (LLMs) have made significant advancements in code-related tasks, yet many LLMs treat code as simple sequences, neglecting its structured nature.

Code Generation

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

no code implementations5 Dec 2023 Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani

Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset.

Code Generation

Minuet: Accelerating 3D Sparse Convolutions on GPUs

1 code implementation1 Dec 2023 Jiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko

Minuet proposes to (i) replace the hash tables used in the Map step with a novel segmented sorting double-traversed binary search algorithm that highly utilizes the on-chip memory hierarchy of GPUs, (ii) use a lightweight scheme to autotune the tile size in the Gather and Scatter operations of the GMaS step, such that to adapt the execution to the particular characteristics of each SC layer, dataset, and GPU architecture, and (iii) employ a padding-efficient GEMM grouping approach that reduces both memory padding and kernel launching overheads.

Sieve: Multimodal Dataset Pruning Using Image Captioning Models

1 code implementation3 Oct 2023 Anas Mahmoud, Mostafa Elhoushi, Amro Abbas, Yu Yang, Newsha Ardalani, Hugh Leather, Ari Morcos

We propose a pruning signal, Sieve, that employs synthetic captions generated by image-captioning models pretrained on small, diverse, and well-aligned image-text pairs to evaluate the alignment of noisy image-text pairs.

Image Captioning Language Modelling +1

OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks

1 code implementation24 Oct 2022 Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty

We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks.

To Filter Prune, or to Layer Prune, That Is The Question

1 code implementation11 Jul 2020 Sara Elkerdawy, Mostafa Elhoushi, Abhineet Singh, Hong Zhang, Nilanjan Ray

LayerPrune presents a set of layer pruning methods based on different criteria that achieve higher latency reduction than filter pruning methods on similar accuracy.

Accelerating Training using Tensor Decomposition

1 code implementation10 Sep 2019 Mostafa Elhoushi, Ye Henry Tian, Zihao Chen, Farhan Shafiq, Joey Yiwei Li

In our approach, we train the model from scratch (i. e., randomly initialized weights) with its original architecture for a small number of epochs, then the model is decomposed, and then continue training the decomposed model till the end.

Tensor Decomposition

DeepShift: Towards Multiplication-Less Neural Networks

1 code implementation30 May 2019 Mostafa Elhoushi, Zihao Chen, Farhan Shafiq, Ye Henry Tian, Joey Yiwei Li

This family of neural network architectures (that use convolutional shifts and fully connected shifts) is referred to as DeepShift models.

Edge-computing Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.