Search Results for author: Mehdi Rezagholizadeh

Found 41 papers, 7 papers with code

RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

no code implementations Findings (EMNLP) 2021 Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais

Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.

Knowledge Distillation Natural Language Processing

Towards Understanding Label Regularization for Fine-tuning Pre-trained Language Models

no code implementations25 May 2022 Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Ali Ghodsi, Pascal Poupart

Knowledge Distillation (KD) is a prominent neural model compression technique which heavily relies on teacher network predictions to guide the training of a student model.

Knowledge Distillation Model Compression

Dynamic Position Encoding for Transformers

no code implementations18 Apr 2022 Joyce Zheng, Mehdi Rezagholizadeh, Peyman Passban

To solve this problem, position embeddings are defined exclusively for each time step to enrich word information.

Machine Translation

JABER and SABER: Junior and Senior Arabic BERt

1 code implementation8 Dec 2021 Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.

Language Modelling NER

NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation

no code implementations9 Nov 2021 David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh

We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly.

Intent Detection Slot Filling

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

no code implementations16 Oct 2021 Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi

A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.

Knowledge Distillation Model Compression +2

Kronecker Decomposition for GPT Compression

no code implementations ACL 2022 Ali Edalati, Marzieh Tahaei, Ahmad Rashid, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh

GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks.

Knowledge Distillation Language Modelling +2

Pseudo Knowledge Distillation: Towards Learning Optimal Instance-specific Label Smoothing Regularization

no code implementations29 Sep 2021 Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.

Knowledge Distillation Natural Language Understanding

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

no code implementations21 Sep 2021 Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart

To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.

Knowledge Distillation

End-to-End Self-Debiasing Framework for Robust NLU Training

no code implementations Findings (ACL) 2021 Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Ahmad Rashid

Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones.

Natural Language Understanding

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

1 code implementation ACL 2021 Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh

We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.

Adversarial Text Data Augmentation +3

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables

no code implementations18 Apr 2021 Krtin Kumar, Peyman Passban, Mehdi Rezagholizadeh, Yiu Sing Lau, Qun Liu

Embedding matrices are key components in neural natural language processing (NLP) models that are responsible to provide numerical representations of input tokens.\footnote{In this paper words and subwords are referred to as \textit{tokens} and the term \textit{embedding} only refers to embeddings of inputs.}

Machine Translation Natural Language Processing +2

Robust Embeddings Via Distributions

no code implementations17 Apr 2021 Kira A. Selby, Yinong Wang, Ruizhe Wang, Peyman Passban, Ahmad Rashid, Mehdi Rezagholizadeh, Pascal Poupart

Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains.

Natural Language Processing

Annealing Knowledge Distillation

no code implementations EACL 2021 Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.

Knowledge Distillation Model Compression

Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

no code implementations10 Mar 2021 Md Akmal Haidar, Mehdi Rezagholizadeh

In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data.

Speech Recognition

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

no code implementations1 Jan 2021 Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.

Knowledge Distillation Model Compression +1

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

no code implementations EMNLP 2021 Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions.

Knowledge Distillation Model Compression +2

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

no code implementations27 Dec 2020 Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training.

Knowledge Distillation

From Unsupervised Machine Translation To Adversarial Text Generation

no code implementations10 Nov 2020 Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh

B-GAN is able to generate a distributed latent space representation which can be paired with an attention based decoder to generate fluent sentences.

Adversarial Text Text Generation +2

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

no code implementations9 Nov 2019 Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.

Automatic Speech Recognition

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

1 code implementation ACL 2019 Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung

We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach.

Machine Translation Text Simplification +1

TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks

1 code implementation23 Apr 2019 Md. Akmal Haidar, Mehdi Rezagholizadeh

Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization.

Image Generation Knowledge Distillation +5

Bilingual-GAN: A Step Towards Parallel Text Generation

no code implementations WS 2019 Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh

Latent space based GAN methods and attention based sequence to sequence models have achieved impressive results in text generation and unsupervised machine translation respectively.

Denoising Text Generation +2

Semi-Supervised Regression with Generative Adverserial Networks for End to End Learning in Autonomous Driving

no code implementations13 Nov 2018 Mehdi Rezagholizadeh, Md Akmal Haidar

We performed several experiments on a publicly available driving dataset to evaluate our proposed method, and the results are very promising.

Autonomous Driving

SALSA-TEXT : self attentive latent space based adversarial text generation

no code implementations28 Sep 2018 Jules Gagnon-Marchand, Hamed Sadeghi, Md. Akmal Haidar, Mehdi Rezagholizadeh

Inspired by the success of self attention mechanism and Transformer architecture in sequence transduction and image generation applications, we propose novel self attention-based architectures to improve the performance of adversarial latent code- based schemes in text generation.

Adversarial Text Image Generation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.