no code implementations • WNUT (ACL) 2021 • Arij Riabi, Benoît Sagot, Djamé Seddah
Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high- resource languages.
no code implementations • ACL (IWPT) 2021 • Gosse Bouma, Djamé Seddah, Daniel Zeman
We describe the second IWPT task on end-to-end parsing from raw text to Enhanced Universal Dependencies.
no code implementations • WS (NoDaLiDa) 2019 • José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski
This work compares the performances achieved by Phrase-Based Statistical Machine Translation systems (PB-SMT) and attention-based Neuronal Machine Translation systems (NMT) when translating User Generated Content (UGC), as encountered in social medias, from French to English.
no code implementations • CONSTRAINT (ACL) 2022 • Syrielle Montariol, Étienne Simon, Arij Riabi, Djamé Seddah
We propose our solution to the multimodal semantic role labeling task from the CONSTRAINT’22 workshop.
no code implementations • JEP/TALN/RECITAL 2022 • Arij Riabi, Syrielle Montariol, Djamé Seddah
La tâche de détection de contenus haineux est ardue, car elle nécessite des connaissances culturelles et contextuelles approfondies ; les connaissances nécessaires varient, entre autres, selon la langue du locateur ou la cible du contenu.
no code implementations • JEP/TALN/RECITAL 2022 • Benjamin Muller, Antonios Anastasopoulos, Benoît Sagot, Djamé Seddah
Dans ce travail, en comparant des modèles multilingues et monolingues, nous montrons que de tels modèles se comportent de multiples façons sur des langues inconnues.
no code implementations • 25 Jun 2024 • Arij Riabi, Menel Mahamdi, Virginie Mouilleron, Djamé Seddah
Protecting privacy is essential when sharing data, particularly in the case of an online radicalization dataset that may contain personal information.
no code implementations • 23 Sep 2023 • Wissam Antoun, Benoît Sagot, Djamé Seddah
The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection.
no code implementations • 9 Jun 2023 • Wissam Antoun, Virginie Mouilleron, Benoît Sagot, Djamé Seddah
This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes.
no code implementations • 2 Jun 2023 • Wissam Antoun, Benoît Sagot, Djamé Seddah
In this paper, we introduce CamemBERTa, a French DeBERTa model that builds upon the DeBERTaV3 architecture and training objective.
no code implementations • 24 Oct 2022 • Syrielle Montariol, Arij Riabi, Djamé Seddah
Zero-shot cross-lingual transfer learning has been shown to be highly challenging for tasks involving a lot of linguistic specificities or when a cultural gap is present between languages, such as in hate speech detection.
1 code implementation • 22 Jun 2022 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah
Starting from a deep probabilistic generative model with attention, we measure the interaction between latent variables and realizations of syntactic roles and show that it is possible to obtain, without supervision, representations of sentences where different syntactic roles correspond to clearly identified different latent variables.
1 code implementation • NAACL 2022 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah
In the attention of Transformers, keys handle information selection while values specify what information is conveyed.
1 code implementation • ACL 2020 • Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg
The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science.
no code implementations • 26 Oct 2021 • Arij Riabi, Benoît Sagot, Djamé Seddah
Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages.
1 code implementation • WNUT (ACL) 2021 • José Carlos Rosales Núñez, Guillaume Wisniewski, Djamé Seddah
This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time.
no code implementations • WNUT (ACL) 2021 • José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski
This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT.
no code implementations • LREC 2022 • Julien Launay, E. L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli, Djamé Seddah
We fit a scaling law for compute for the French language, and compare it with its English counterpart.
1 code implementation • EMNLP (insights) 2021 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah
We compare the simplified versions to standard SSVAEs on 4 text classification tasks.
1 code implementation • EACL 2021 • Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah
Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
no code implementations • 24 Dec 2020 • Yves Rychener, Xavier Renard, Djamé Seddah, Pascal Frossard, Marcin Detyniecki
NLP Interpretability aims to increase trust in model predictions.
1 code implementation • 24 Dec 2020 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah
We present an unsupervised method to obtain disentangled representations of sentences that single out semantic content.
1 code implementation • 24 Dec 2020 • Yves Rychener, Xavier Renard, Djamé Seddah, Pascal Frossard, Marcin Detyniecki
Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response.
no code implementations • 3 Nov 2020 • Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis.
1 code implementation • NAACL 2021 • Benjamin Muller, Antonis Anastasopoulos, Benoît Sagot, Djamé Seddah
Focusing on the latter, we show that this failure to transfer is largely related to the impact of the script used to write such languages.
1 code implementation • EMNLP 2021 • Arij Riabi, Thomas Scialom, Rachel Keraron, Benoît Sagot, Djamé Seddah, Jacopo Staiano
Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task.
2 code implementations • 13 Oct 2020 • Ghazi Felhi, Joseph Leroux, Djamé Seddah
Even though Variational Autoencoders (VAEs) are widely used for semi-supervised learning, the reason why they work remains unclear.
no code implementations • 1 May 2020 • Benjamin Muller, Benoit Sagot, Djamé Seddah
Building natural language processing systems for non standardized and low resource languages is a difficult challenge.
8 code implementations • ACL 2020 • Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot
We show that the use of web crawled data is preferable to the use of Wikipedia data.
Ranked #1 on Natural Language Inference on XNLI French