Search Results for author: Philippe Langlais

Found 32 papers, 5 papers with code

Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning

no code implementations • ACL 2022 • Guillaume Le Berre, Christophe Cerisara, Philippe Langlais, Guy Lapalme

Pre-trained models have shown very good performances on a number of question answering benchmarks especially when fine-tuned on multiple question answering datasets at once.

Multiple-choice Question Answering +2

Paper
Add Code

Effective Data Augmentation for Sentence Classification Using One VAE per Class

no code implementations • COLING 2022 • Frédéric Piedboeuf, Philippe Langlais

In recent years, data augmentation has become an important field of machine learning.

Binary Classification Data Augmentation +2

Paper
Add Code

RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

no code implementations • Findings (EMNLP) 2021 • Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais

Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.

Knowledge Distillation

Paper
Add Code

A Methodology for Building a Diachronic Dataset of Semantic Shifts and its Application to QC-FR-Diac-V1.0, a Free Reference for French

no code implementations • LREC 2022 • David Kletz, Philippe Langlais, François Lareau, Patrick Drouin

Different algorithms have been proposed to detect semantic shifts (changes in a word meaning over time) in a diachronic corpus.

Word Sense Disambiguation

Paper
Add Code

Refining an Almost Clean Translation Memory Helps Machine Translation

no code implementations • AMTA 2022 • Shivendra Bhardwa, David Alfonso-Hermelo, Philippe Langlais, Gabriel Bernier-Colborne, Cyril Goutte, Michel Simard

While recent studies have been dedicated to cleaning very noisy parallel corpora to improve Machine Translation training, we focus in this work on filtering a large and mostly clean Translation Memory.

Machine Translation Translation

Paper
Add Code

Exploiting Domain-Specific Knowledge for Judgment Prediction Is No Panacea

no code implementations • RANLP 2021 • Olivier Salaün, Philippe Langlais, Karim Benyekhlef

Legal judgment prediction (LJP) usually consists in a text classification task aimed at predicting the verdict on the basis of the fact description.

Legal Reasoning text-classification +1

Paper
Add Code

EUROPA: A Legal Multilingual Keyphrase Generation Dataset

no code implementations • 1 Mar 2024 • Olivier Salaün, Frédéric Piedboeuf, Guillaume Le Berre, David Alfonso Hermelo, Philippe Langlais

Keyphrase generation has primarily been explored within the context of academic research articles, with a particular focus on scientific domains and the English language.

Keyphrase Generation

Paper
Add Code

Data Augmentation is Dead, Long Live Data Augmentation

no code implementations • 22 Feb 2024 • Frédéric Piedboeuf, Philippe Langlais

Textual data augmentation (DA) is a prolific field of study where novel techniques to create artificial data are regularly proposed, and that has demonstrated great efficiency on small data settings, at least for text classification tasks.

Data Augmentation text-classification +1

Paper
Add Code

On the importance of Data Scale in Pretraining Arabic Language Models

1 code implementation • 15 Jan 2024 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Boxing Chen

Pretraining monolingual language models have been proven to be vital for performance in Arabic Natural Language Processing (NLP) tasks.

Language Modelling

2,953

Paper
Code

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

no code implementations • 8 May 2023 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.

Image Classification Machine Translation

Paper
Add Code

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

no code implementations • 12 Dec 2022 • Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais

Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.

Knowledge Distillation Question Answering +2

Paper
Add Code

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

no code implementations • 21 May 2022 • Abbas Ghaddar, Yimeng Wu, Sunyam Bagga, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language.

Natural Language Understanding

Paper
Add Code

CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

no code implementations • COLING 2022 • Md Akmal Haidar, Mehdi Rezagholizadeh, Abbas Ghaddar, Khalil Bibi, Philippe Langlais, Pascal Poupart

Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models.

Contrastive Learning Data Augmentation +1

Paper
Add Code

JABER and SABER: Junior and Senior Arabic BERt

1 code implementation • 8 Dec 2021 • Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.

Language Modelling NER

2,953

Paper
Code

NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation

no code implementations • 9 Nov 2021 • David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh

We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly.

Intent Detection slot-filling +1

Paper
Add Code

Pseudo Knowledge Distillation: Towards Learning Optimal Instance-specific Label Smoothing Regularization

no code implementations • 29 Sep 2021 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.

Image Classification Knowledge Distillation +1

Paper
Add Code

Knowledge Distillation with Noisy Labels for Natural Language Understanding

no code implementations • WNUT (ACL) 2021 • Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications.

Knowledge Distillation Natural Language Understanding

Paper
Add Code

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

no code implementations • Findings (NAACL) 2022 • Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart

To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.

Knowledge Distillation

Paper
Add Code

End-to-End Self-Debiasing Framework for Robust NLU Training

no code implementations • Findings (ACL) 2021 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Ahmad Rashid

Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones.

Natural Language Understanding

Paper
Add Code

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

no code implementations • 24 Jul 2021 • Abbas Ghaddar, Philippe Langlais, Ahmad Rashid, Mehdi Rezagholizadeh

In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity.

Data Augmentation named-entity-recognition +2

Paper
Add Code

SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling

no code implementations • NAACL 2019 • Peng Lu, Ting Bai, Philippe Langlais

Multi-task learning (MTL) has been studied recently for sequence labeling.

Chunking Multi-Task Learning +4

Paper
Add Code

WiRe57 : A Fine-Grained Benchmark for Open Information Extraction

1 code implementation • WS 2019 • William Léchelle, Fabrizio Gotti, Philippe Langlais

We build a reference for the task of Open Information Extraction, on five documents.

Ranked #4 on Open Information Extraction on WiRe57

Open Information Extraction

Paper
Code

Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation

1 code implementation • COLING 2018 • Francis Grégoire, Philippe Langlais

Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications.

Feature Engineering Machine Translation +2

Paper
Code

Robust Lexical Features for Improved Neural Network Named-Entity Recognition

1 code implementation • COLING 2018 • Abbas Ghaddar, Philippe Langlais

While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers.

Ranked #22 on Named Entity Recognition (NER) on Ontonotes v5 (English) (using extra training data)

named-entity-recognition Named Entity Recognition +1

Paper
Code

Revisiting the Task of Scoring Open IE Relations

no code implementations • LREC 2018 • William L{\'e}chelle, Philippe Langlais

Knowledge Base Completion Language Modelling +1

Paper
Add Code

Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus

no code implementations • LREC 2018 • Abbas Ghaddar, Philippe Langlais

Entity Linking Entity Typing +4

Paper
Add Code

A Deep Neural Network Approach To Parallel Sentence Extraction

no code implementations • 28 Sep 2017 • Francis Grégoire, Philippe Langlais

Parallel sentence extraction is a task addressing the data sparsity problem found in multilingual natural language processing applications.

Feature Engineering Machine Translation +3

Paper
Add Code

Translating Implicit Discourse Connectives Based on Cross-lingual Annotation and Alignment

no code implementations • WS 2017 • Hongzheng Li, Philippe Langlais, Yaohong Jin

Implicit discourse connectives and relations are distributed more widely in Chinese texts, when translating into English, such connectives are usually translated explicitly.

Implicit Relations Machine Translation +1