no code implementations • EMNLP 2021 • Yimeng Wu, Mehdi Rezagholizadeh, Abbas Ghaddar, Md Akmal Haidar, Ali Ghodsi
Intermediate layer matching is shown as an effective approach for improving knowledge distillation (KD).
no code implementations • Findings (EMNLP) 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge in a large neural network into a smaller one.
no code implementations • Findings (EMNLP) 2021 • Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais
Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.
no code implementations • 25 May 2022 • Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Ali Ghodsi, Pascal Poupart
Knowledge Distillation (KD) is a prominent neural model compression technique which heavily relies on teacher network predictions to guide the training of a student model.
no code implementations • 21 May 2022 • Abbas Ghaddar, Yimeng Wu, Sunyam Bagga, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais
There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language.
no code implementations • 18 Apr 2022 • Joyce Zheng, Mehdi Rezagholizadeh, Peyman Passban
To solve this problem, position embeddings are defined exclusively for each time step to enrich word information.
no code implementations • 15 Apr 2022 • Md Akmal Haidar, Mehdi Rezagholizadeh, Abbas Ghaddar, Khalil Bibi, Philippe Langlais, Pascal Poupart
Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models.
1 code implementation • Findings (ACL) 2022 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi
From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.
1 code implementation • 8 Dec 2021 • Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais
Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.
no code implementations • 9 Nov 2021 • David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh
We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly.
no code implementations • 16 Oct 2021 • Tianda Li, Yassir El Mesbahi, Ivan Kobyzev, Ahmad Rashid, Atif Mahmud, Nithin Anchuri, Habib Hajimolahoseini, Yang Liu, Mehdi Rezagholizadeh
Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks.
no code implementations • 16 Oct 2021 • Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi
A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.
no code implementations • ACL 2022 • Ali Edalati, Marzieh Tahaei, Ahmad Rashid, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh
GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks.
no code implementations • 29 Sep 2021 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.
no code implementations • 21 Sep 2021 • Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart
To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.
no code implementations • WNUT (ACL) 2021 • Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications.
no code implementations • 13 Sep 2021 • Marzieh S. Tahaei, Ella Charlaix, Vahid Partovi Nia, Ali Ghodsi, Mehdi Rezagholizadeh
We present our KroneckerBERT, a compressed version of the BERT_BASE model obtained using this framework.
1 code implementation • 13 Sep 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one.
no code implementations • Findings (ACL) 2021 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Ahmad Rashid
Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones.
no code implementations • 24 Jul 2021 • Abbas Ghaddar, Philippe Langlais, Ahmad Rashid, Mehdi Rezagholizadeh
In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity.
1 code implementation • Findings (ACL) 2021 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Peyman Passban, Ali Ghodsi
We exploit a semi-supervised approach based on KD to train a model on augmented data.
1 code implementation • ACL 2021 • Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh
We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.
no code implementations • 18 Apr 2021 • Krtin Kumar, Peyman Passban, Mehdi Rezagholizadeh, Yiu Sing Lau, Qun Liu
Embedding matrices are key components in neural natural language processing (NLP) models that are responsible to provide numerical representations of input tokens.\footnote{In this paper words and subwords are referred to as \textit{tokens} and the term \textit{embedding} only refers to embeddings of inputs.}
no code implementations • 17 Apr 2021 • Kira A. Selby, Yinong Wang, Ruizhe Wang, Peyman Passban, Ahmad Rashid, Mehdi Rezagholizadeh, Pascal Poupart
Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains.
no code implementations • EACL 2021 • Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi
Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.
no code implementations • 17 Mar 2021 • Md Akmal Haidar, Chao Xing, Mehdi Rezagholizadeh
End-to-end automatic speech recognition (ASR), unlike conventional ASR, does not have modules to learn the semantic representation from speech encoder.
Ranked #9 on
Speech Recognition
on LibriSpeech test-clean
no code implementations • 10 Mar 2021 • Md Akmal Haidar, Mehdi Rezagholizadeh
In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data.
no code implementations • 1 Jan 2021 • Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi
Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.
no code implementations • EMNLP 2021 • Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions.
no code implementations • 27 Dec 2020 • Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu
Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training.
no code implementations • 10 Nov 2020 • Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh
B-GAN is able to generate a distributed latent space representation which can be paired with an attention based decoder to generate fluent sentences.
no code implementations • 9 Nov 2019 • Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh
While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Gabriele Prato, Ella Charlaix, Mehdi Rezagholizadeh
State-of-the-art neural machine translation methods employ massive amounts of parameters.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md. Akmal Haidar, Mehdi Rezagholizadeh
Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored.
no code implementations • 25 Sep 2019 • Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md Akmal Haidar, Mehdi Rezagholizadeh
Word-embeddings are a vital component of Natural Language Processing (NLP) systems and have been extensively researched.
1 code implementation • ACL 2019 • Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung
We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach.
Ranked #2 on
Text Simplification
on PWKP / WikiSmall
(SARI metric)
1 code implementation • 23 Apr 2019 • Md. Akmal Haidar, Mehdi Rezagholizadeh
Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization.
no code implementations • NAACL 2019 • Md. Akmal Haidar, Mehdi Rezagholizadeh, Alan Do-Omri, Ahmad Rashid
This soft representation will be used in GAN discrimination to synthesize similar soft-texts.
no code implementations • WS 2019 • Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh
Latent space based GAN methods and attention based sequence to sequence models have achieved impressive results in text generation and unsupervised machine translation respectively.
no code implementations • 13 Nov 2018 • Mehdi Rezagholizadeh, Md Akmal Haidar
We performed several experiments on a publicly available driving dataset to evaluate our proposed method, and the results are very promising.
no code implementations • 28 Sep 2018 • Jules Gagnon-Marchand, Hamed Sadeghi, Md. Akmal Haidar, Mehdi Rezagholizadeh
Inspired by the success of self attention mechanism and Transformer architecture in sequence transduction and image generation applications, we propose novel self attention-based architectures to improve the performance of adversarial latent code- based schemes in text generation.