Search Results for author: Mehdi Rezagholizadeh

Found 70 papers, 16 papers with code

JABER and SABER: Junior and Senior Arabic BERt

1 code implementation8 Dec 2021 Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.

Language Modelling NER

On the importance of Data Scale in Pretraining Arabic Language Models

1 code implementation15 Jan 2024 Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Boxing Chen

Pretraining monolingual language models have been proven to be vital for performance in Arabic Natural Language Processing (NLP) tasks.

Language Modelling

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

2 code implementations14 Oct 2022 Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi

Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training.

Natural Language Understanding Text Generation

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

1 code implementation18 Oct 2022 Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, Jimmy Lin

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world.

Information Retrieval Retrieval

Annealing Knowledge Distillation

1 code implementation EACL 2021 Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.

Image Classification Knowledge Distillation +1

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

1 code implementation ACL 2021 Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh

We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.

Adversarial Text Data Augmentation +2

When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

1 code implementation Findings (ACL) 2022 Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi

From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.

Data Augmentation Knowledge Distillation

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

1 code implementation ACL 2019 Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung

We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach.

Machine Translation Sentence +2

Resonance RoPE: Improving Context Length Generalization of Large Language Models

1 code implementation29 Feb 2024 Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu

This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences.

Language Modelling Position

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

1 code implementation18 Dec 2023 Nandan Thakur, Luiz Bonifacio, Xinyu Zhang, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Boxing Chen, Mehdi Rezagholizadeh, Jimmy Lin

We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.

Hallucination Language Modelling +2

TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks

1 code implementation23 Apr 2019 Md. Akmal Haidar, Mehdi Rezagholizadeh

Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization.

Image Generation Knowledge Distillation +5

Learning Functions on Multiple Sets using Multi-Set Transformers

1 code implementation30 Jun 2022 Kira Selby, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

We propose a general deep architecture for learning functions on multiple permutation-invariant sets.

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

4 code implementations9 Nov 2019 Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

SALSA-TEXT : self attentive latent space based adversarial text generation

no code implementations28 Sep 2018 Jules Gagnon-Marchand, Hamed Sadeghi, Md. Akmal Haidar, Mehdi Rezagholizadeh

Inspired by the success of self attention mechanism and Transformer architecture in sequence transduction and image generation applications, we propose novel self attention-based architectures to improve the performance of adversarial latent code- based schemes in text generation.

Adversarial Text Image Generation +3

Bilingual-GAN: A Step Towards Parallel Text Generation

no code implementations WS 2019 Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh

Latent space based GAN methods and attention based sequence to sequence models have achieved impressive results in text generation and unsupervised machine translation respectively.

Denoising Text Generation +2

From Unsupervised Machine Translation To Adversarial Text Generation

no code implementations10 Nov 2020 Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh

B-GAN is able to generate a distributed latent space representation which can be paired with an attention based decoder to generate fluent sentences.

Adversarial Text Text Generation +2

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

no code implementations27 Dec 2020 Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training.

Knowledge Distillation

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

no code implementations EMNLP 2021 Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions.

Knowledge Distillation Model Compression +1

Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

no code implementations10 Mar 2021 Md Akmal Haidar, Mehdi Rezagholizadeh

In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data.

speech-recognition Speech Recognition

Semi-Supervised Regression with Generative Adverserial Networks for End to End Learning in Autonomous Driving

no code implementations13 Nov 2018 Mehdi Rezagholizadeh, Md Akmal Haidar

We performed several experiments on a publicly available driving dataset to evaluate our proposed method, and the results are very promising.

Autonomous Driving regression

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables

no code implementations18 Apr 2021 Krtin Kumar, Peyman Passban, Mehdi Rezagholizadeh, Yiu Sing Lau, Qun Liu

Embedding matrices are key components in neural natural language processing (NLP) models that are responsible to provide numerical representations of input tokens.\footnote{In this paper words and subwords are referred to as \textit{tokens} and the term \textit{embedding} only refers to embeddings of inputs.}

Machine Translation NMT +2

Robust Embeddings Via Distributions

no code implementations17 Apr 2021 Kira A. Selby, Yinong Wang, Ruizhe Wang, Peyman Passban, Ahmad Rashid, Mehdi Rezagholizadeh, Pascal Poupart

Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains.

End-to-End Self-Debiasing Framework for Robust NLU Training

no code implementations Findings (ACL) 2021 Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Ahmad Rashid

Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones.

Natural Language Understanding

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

no code implementations Findings (NAACL) 2022 Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart

To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.

Knowledge Distillation

Pseudo Knowledge Distillation: Towards Learning Optimal Instance-specific Label Smoothing Regularization

no code implementations29 Sep 2021 Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.

Image Classification Knowledge Distillation +1

Kronecker Decomposition for GPT Compression

no code implementations ACL 2022 Ali Edalati, Marzieh Tahaei, Ahmad Rashid, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh

GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks.

Knowledge Distillation Language Modelling +1

RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

no code implementations Findings (EMNLP) 2021 Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais

Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.

Knowledge Distillation

NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation

no code implementations9 Nov 2021 David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh

We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly.

Intent Detection slot-filling +1

Dynamic Position Encoding for Transformers

no code implementations COLING 2022 Joyce Zheng, Mehdi Rezagholizadeh, Peyman Passban

To solve this problem, position embeddings are defined exclusively for each time step to enrich word information.

Machine Translation NMT +1

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

no code implementations25 May 2022 Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.

Knowledge Distillation Model Compression

Towards Fine-tuning Pre-trained Language Models with Integer Forward and Backward Propagation

no code implementations20 Sep 2022 Mohammadreza Tayaranian, Alireza Ghaffari, Marzieh S. Tahaei, Mehdi Rezagholizadeh, Masoud Asgharian, Vahid Partovi Nia

Previously researchers were focused on lower bit-width integer data types for the forward propagation of language models to save memory and computation.

Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement

no code implementations12 Nov 2022 Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

Self-supervised speech representation learning aims to extract meaningful factors from the speech signal that can later be used across different downstream tasks, such as speech and/or emotion recognition.

Data Augmentation Emotion Recognition +2

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

no code implementations12 Dec 2022 Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).

Knowledge Distillation Natural Language Understanding

KronA: Parameter Efficient Tuning with Kronecker Adapter

no code implementations20 Dec 2022 Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh

We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.

Language Modelling

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

no code implementations27 Jan 2023 Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.

Knowledge Distillation Model Compression

RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

no code implementations18 Feb 2023 Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

The proposed layer-wise distillation recipe is evaluated on top of three well-established universal representations, as well as with three downstream tasks.

Knowledge Distillation Multi-Task Learning

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

no code implementations3 Apr 2023 Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.

Cross-Lingual Information Retrieval Retrieval

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

no code implementations8 May 2023 Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.

Image Classification Machine Translation

Evaluating Embedding APIs for Information Retrieval

no code implementations10 May 2023 Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin

The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs.

Domain Generalization Information Retrieval +2

Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

no code implementations12 Jun 2023 Anderson R. Avila, Mehdi Rezagholizadeh, Chao Xing

In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

no code implementations11 Jun 2023 Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, Abbas Ghaddar

Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP).

Few-Shot Learning

Attribute Controlled Dialogue Prompting

no code implementations11 Jul 2023 Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks.

Attribute Dialogue Generation

SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

no code implementations1 Sep 2023 Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa Kavehzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi

Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous specific models.

Image Classification Model Selection

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference

no code implementations16 Sep 2023 Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

We extend SortedNet to generative NLP tasks, making large language models dynamic without any Pre-Training and by only replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT).

Instruction Following Question Answering +1

On the Impact of Quantization and Pruning of Self-Supervised Speech Models for Downstream Speech Recognition Tasks "In-the-Wild''

no code implementations25 Sep 2023 Arthur Pimentel, Heitor Guimarães, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors.

Data Augmentation Model Compression +4

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

no code implementations3 Feb 2024 Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses.

Logical Reasoning Long-Context Understanding

Towards Practical Tool Usage for Continually Learning LLMs

no code implementations14 Apr 2024 Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar

Large language models (LLMs) show an innate skill for solving language based tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.