no code implementations • AMTA 2022 • Shivendra Bhardwa, David Alfonso-Hermelo, Philippe Langlais, Gabriel Bernier-Colborne, Cyril Goutte, Michel Simard
While recent studies have been dedicated to cleaning very noisy parallel corpora to improve Machine Translation training, we focus in this work on filtering a large and mostly clean Translation Memory.
no code implementations • RANLP 2021 • Olivier Salaün, Philippe Langlais, Karim Benyekhlef
Legal judgment prediction (LJP) usually consists in a text classification task aimed at predicting the verdict on the basis of the fact description.
no code implementations • Findings (EMNLP) 2021 • Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais
Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.
no code implementations • LREC 2022 • David Kletz, Philippe Langlais, François Lareau, Patrick Drouin
Different algorithms have been proposed to detect semantic shifts (changes in a word meaning over time) in a diachronic corpus.
no code implementations • ACL 2022 • Guillaume Le Berre, Christophe Cerisara, Philippe Langlais, Guy Lapalme
Pre-trained models have shown very good performances on a number of question answering benchmarks especially when fine-tuned on multiple question answering datasets at once.
no code implementations • COLING 2022 • Frédéric Piedboeuf, Philippe Langlais
In recent years, data augmentation has become an important field of machine learning.
no code implementations • 29 Jan 2025 • Elie Antoine, Frédéric Béchet, Géraldine Damnati, Philippe Langlais
We introduce an evaluation methodology for reading comprehension tasks based on the intuition that certain examples, by the virtue of their linguistic complexity, consistently yield lower scores regardless of model size or architecture.
no code implementations • 22 Dec 2024 • Elie Antoine, Frédéric Béchet, Philippe Langlais
This study investigates the behavior of model-integrated routers in Mixture of Experts (MoE) models, focusing on how tokens are routed based on their linguistic features, specifically Part-of-Speech (POS) tags.
no code implementations • 23 Jul 2024 • Fabrice Lamarche, Philippe Langlais
Open Information Extraction (OIE) is a field of natural language processing that aims to present textual information in a format that allows it to be organized, analyzed and reflected upon.
no code implementations • 24 May 2024 • Abbas Ghaddar, David Alfonso-Hermelo, Philippe Langlais, Mehdi Rezagholizadeh, Boxing Chen, Prasanna Parthasarathi
In this work, we dive deep into one of the popular knowledge-grounded dialogue benchmarks that focus on faithfulness, FaithDial.
1 code implementation • 1 Mar 2024 • Olivier Salaün, Frédéric Piedboeuf, Guillaume Le Berre, David Alfonso Hermelo, Philippe Langlais
Keyphrase generation has primarily been explored within the context of academic research articles, with a particular focus on scientific domains and the English language.
no code implementations • 22 Feb 2024 • Frédéric Piedboeuf, Philippe Langlais
Textual data augmentation (DA) is a prolific field of study where novel techniques to create artificial data are regularly proposed, and that has demonstrated great efficiency on small data settings, at least for text classification tasks.
1 code implementation • 15 Jan 2024 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Boxing Chen
Pretraining monolingual language models have been proven to be vital for performance in Arabic Natural Language Processing (NLP) tasks.
no code implementations • 8 May 2023 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.
no code implementations • 12 Dec 2022 • Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais
Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.
no code implementations • 21 May 2022 • Abbas Ghaddar, Yimeng Wu, Sunyam Bagga, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais
There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language.
no code implementations • COLING 2022 • Md Akmal Haidar, Mehdi Rezagholizadeh, Abbas Ghaddar, Khalil Bibi, Philippe Langlais, Pascal Poupart
Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models.
1 code implementation • 8 Dec 2021 • Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais
Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.
no code implementations • 9 Nov 2021 • David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh
We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly.
no code implementations • 29 Sep 2021 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.
no code implementations • WNUT (ACL) 2021 • Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications.
no code implementations • Findings (NAACL) 2022 • Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart
To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.
no code implementations • Findings (ACL) 2021 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Ahmad Rashid
Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones.
no code implementations • 24 Jul 2021 • Abbas Ghaddar, Philippe Langlais, Ahmad Rashid, Mehdi Rezagholizadeh
In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity.
no code implementations • NAACL 2019 • Peng Lu, Ting Bai, Philippe Langlais
Multi-task learning (MTL) has been studied recently for sequence labeling.
1 code implementation • WS 2019 • William Léchelle, Fabrizio Gotti, Philippe Langlais
We build a reference for the task of Open Information Extraction, on five documents.
Ranked #4 on
Open Information Extraction
on WiRe57
1 code implementation • COLING 2018 • Francis Grégoire, Philippe Langlais
Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications.
1 code implementation • COLING 2018 • Abbas Ghaddar, Philippe Langlais
While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers.
Ranked #23 on
Named Entity Recognition (NER)
on Ontonotes v5 (English)
(using extra training data)
no code implementations • 28 Sep 2017 • Francis Grégoire, Philippe Langlais
Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications.
no code implementations • WS 2017 • Hongzheng Li, Philippe Langlais, Yaohong Jin
Implicit discourse connectives and relations are distributed more widely in Chinese texts, when translating into English, such connectives are usually translated explicitly.
no code implementations • WS 2017 • Francis Gr{\'e}goire, Philippe Langlais
This paper describes our participation in BUCC 2017 shared task: identifying parallel sentences in comparable corpora.
no code implementations • LREC 2012 • Philippe Langlais, Patrick Drouin, Am{\'e}lie Paulus, Eug{\'e}nie Rompr{\'e} Brodeur, Florent Cottin
In October 2009, was launched the Quebec French part of the international sms4science project, called texto4science.