Search Results for author: Djamé Seddah

Found 29 papers, 12 papers with code

Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios?

no code implementations • WNUT (ACL) 2021 • Arij Riabi, Benoît Sagot, Djamé Seddah

Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high- resource languages.

Dependency Parsing Language Modelling +1

Paper
Add Code

Fine-tuning and Sampling Strategies for Multimodal Role Labeling of Entities under Class Imbalance

no code implementations • CONSTRAINT (ACL) 2022 • Syrielle Montariol, Étienne Simon, Arij Riabi, Djamé Seddah

We propose our solution to the multimodal semantic role labeling task from the CONSTRAINT’22 workshop.

Semantic Role Labeling

Paper
Add Code

Quand être absent de mBERT n’est que le commencement : Gérer de nouvelles langues à l’aide de modèles de langues multilingues (When Being Unseen from mBERT is just the Beginning : Handling New Languages With Multilingual Language Models)

no code implementations • JEP/TALN/RECITAL 2022 • Benjamin Muller, Antonios Anastasopoulos, Benoît Sagot, Djamé Seddah

Dans ce travail, en comparant des modèles multilingues et monolingues, nous montrons que de tels modèles se comportent de multiples façons sur des langues inconnues.

Paper
Add Code

Tâches Auxiliaires Multilingues pour le Transfert de Modèles de Détection de Discours Haineux (Multilingual Auxiliary Tasks for Zero-Shot Cross-Lingual Transfer of Hate Speech Detection)

no code implementations • JEP/TALN/RECITAL 2022 • Arij Riabi, Syrielle Montariol, Djamé Seddah

La tâche de détection de contenus haineux est ardue, car elle nécessite des connaissances culturelles et contextuelles approfondies ; les connaissances nécessaires varient, entre autres, selon la langue du locateur ou la cible du contenu.

Hate Speech Detection Zero-Shot Cross-Lingual Transfer

Paper
Add Code

From Raw Text to Enhanced Universal Dependencies: The Parsing Shared Task at IWPT 2021

no code implementations • ACL (IWPT) 2021 • Gosse Bouma, Djamé Seddah, Daniel Zeman

We describe the second IWPT task on end-to-end parsing from raw text to Enhanced Universal Dependencies.

Paper
Add Code

Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content

no code implementations • WS (NoDaLiDa) 2019 • José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work compares the performances achieved by Phrase-Based Statistical Machine Translation systems (PB-SMT) and attention-based Neuronal Machine Translation systems (NMT) when translating User Generated Content (UGC), as encountered in social medias, from French to English.

Machine Translation NMT +1

Paper
Add Code

From Text to Source: Results in Detecting Large Language Model-Generated Content

no code implementations • 23 Sep 2023 • Wissam Antoun, Benoît Sagot, Djamé Seddah

The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection.

Attribute Language Modelling +3

Paper
Add Code

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

no code implementations • 9 Jun 2023 • Wissam Antoun, Virginie Mouilleron, Benoît Sagot, Djamé Seddah

This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes.

Adversarial Text Language Modelling

Paper
Add Code

Data-Efficient French Language Modeling with CamemBERTa

no code implementations • 2 Jun 2023 • Wissam Antoun, Benoît Sagot, Djamé Seddah

In this paper, we introduce CamemBERTa, a French DeBERTa model that builds upon the DeBERTaV3 architecture and training objective.

Dependency Parsing FLUE +5

Paper
Add Code

Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

no code implementations • 24 Oct 2022 • Syrielle Montariol, Arij Riabi, Djamé Seddah

Zero-shot cross-lingual transfer learning has been shown to be highly challenging for tasks involving a lot of linguistic specificities or when a cultural gap is present between languages, such as in hate speech detection.

Hate Speech Detection named-entity-recognition +5

Paper
Add Code

Towards Unsupervised Content Disentanglement in Sentence Representations via Syntactic Roles

1 code implementation • 22 Jun 2022 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah

Starting from a deep probabilistic generative model with attention, we measure the interaction between latent variables and realizations of syntactic roles and show that it is possible to obtain, without supervision, representations of sentences where different syntactic roles correspond to clearly identified different latent variables.

Disentanglement Machine Translation +1

Paper
Code

Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs

1 code implementation • NAACL 2022 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah

In the attention of Transformers, keys handle information selection while values specify what information is conveyed.

Disentanglement Inductive Bias +1

Paper
Code

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

1 code implementation • ACL 2020 • Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg

The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science.

Word Embeddings

Paper
Code

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?

no code implementations • 26 Oct 2021 • Arij Riabi, Benoît Sagot, Djamé Seddah

Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages.

Dependency Parsing Language Modelling +1

Paper
Add Code

Understanding the Impact of UGC Specificities on Translation Quality

no code implementations • WNUT (ACL) 2021 • José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT.

Translation

Paper
Add Code

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

1 code implementation • WNUT (ACL) 2021 • José Carlos Rosales Núñez, Guillaume Wisniewski, Djamé Seddah

This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time.

Machine Translation Translation

Paper
Code

PAGnol: An Extra-Large French Generative Model

no code implementations • LREC 2022 • Julien Launay, E. L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli, Djamé Seddah

We fit a scaling law for compute for the French language, and compare it with its English counterpart.

Paper
Add Code

Challenging the Semi-Supervised VAE Framework for Text Classification

1 code implementation • EMNLP (insights) 2021 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah

We compare the simplified versions to standard SSVAEs on 4 text classification tasks.

text-classification Text Classification

Paper
Code

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

1 code implementation • EACL 2021 • Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah

Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.

Language Modelling Zero-Shot Cross-Lingual Transfer

Paper
Code

Disentangling semantics in language through VAEs and a certain architectural choice

1 code implementation • 24 Dec 2020 • Ghazi Felhi, Joseph Le Roux, Djamé Seddah

We present an unsupervised method to obtain disentangled representations of sentences that single out semantic content.

Open Information Extraction Sentence

Paper
Code

On the Granularity of Explanations in Model Agnostic NLP Interpretability

1 code implementation • 24 Dec 2020 • Yves Rychener, Xavier Renard, Djamé Seddah, Pascal Frossard, Marcin Detyniecki

Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response.

Paper
Code

QUACKIE: A NLP Classification Task With Ground Truth Explanations

no code implementations • 24 Dec 2020 • Yves Rychener, Xavier Renard, Djamé Seddah, Pascal Frossard, Marcin Detyniecki

NLP Interpretability aims to increase trust in model predictions.

Classification General Classification +1

Paper
Add Code

Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

no code implementations • 3 Nov 2020 • Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes

This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis.