Search Results for author: Djamé Seddah

Found 30 papers, 12 papers with code

Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios?

no code implementations WNUT (ACL) 2021 Arij Riabi, Benoît Sagot, Djamé Seddah

Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high- resource languages.

Dependency Parsing Language Modelling +1

From Raw Text to Enhanced Universal Dependencies: The Parsing Shared Task at IWPT 2021

no code implementations ACL (IWPT) 2021 Gosse Bouma, Djamé Seddah, Daniel Zeman

We describe the second IWPT task on end-to-end parsing from raw text to Enhanced Universal Dependencies.

Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content

no code implementations WS (NoDaLiDa) 2019 José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work compares the performances achieved by Phrase-Based Statistical Machine Translation systems (PB-SMT) and attention-based Neuronal Machine Translation systems (NMT) when translating User Generated Content (UGC), as encountered in social medias, from French to English.

Machine Translation NMT +1

Tâches Auxiliaires Multilingues pour le Transfert de Modèles de Détection de Discours Haineux (Multilingual Auxiliary Tasks for Zero-Shot Cross-Lingual Transfer of Hate Speech Detection)

no code implementations JEP/TALN/RECITAL 2022 Arij Riabi, Syrielle Montariol, Djamé Seddah

La tâche de détection de contenus haineux est ardue, car elle nécessite des connaissances culturelles et contextuelles approfondies ; les connaissances nécessaires varient, entre autres, selon la langue du locateur ou la cible du contenu.

Hate Speech Detection Zero-Shot Cross-Lingual Transfer

Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks

no code implementations25 Jun 2024 Arij Riabi, Menel Mahamdi, Virginie Mouilleron, Djamé Seddah

Protecting privacy is essential when sharing data, particularly in the case of an online radicalization dataset that may contain personal information.

Classification

From Text to Source: Results in Detecting Large Language Model-Generated Content

no code implementations23 Sep 2023 Wissam Antoun, Benoît Sagot, Djamé Seddah

The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection.

Attribute Language Modelling +3

Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

no code implementations9 Jun 2023 Wissam Antoun, Virginie Mouilleron, Benoît Sagot, Djamé Seddah

This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes.

Adversarial Text Language Modelling

Data-Efficient French Language Modeling with CamemBERTa

no code implementations2 Jun 2023 Wissam Antoun, Benoît Sagot, Djamé Seddah

In this paper, we introduce CamemBERTa, a French DeBERTa model that builds upon the DeBERTaV3 architecture and training objective.

Dependency Parsing FLUE +5

Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

no code implementations24 Oct 2022 Syrielle Montariol, Arij Riabi, Djamé Seddah

Zero-shot cross-lingual transfer learning has been shown to be highly challenging for tasks involving a lot of linguistic specificities or when a cultural gap is present between languages, such as in hate speech detection.

Hate Speech Detection named-entity-recognition +5

Towards Unsupervised Content Disentanglement in Sentence Representations via Syntactic Roles

1 code implementation22 Jun 2022 Ghazi Felhi, Joseph Le Roux, Djamé Seddah

Starting from a deep probabilistic generative model with attention, we measure the interaction between latent variables and realizations of syntactic roles and show that it is possible to obtain, without supervision, representations of sentences where different syntactic roles correspond to clearly identified different latent variables.

Decoder Disentanglement +2

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

1 code implementation ACL 2020 Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg

The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science.

Word Embeddings

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?

no code implementations26 Oct 2021 Arij Riabi, Benoît Sagot, Djamé Seddah

Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages.

Dependency Parsing Language Modelling +1

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

1 code implementation WNUT (ACL) 2021 José Carlos Rosales Núñez, Guillaume Wisniewski, Djamé Seddah

This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time.

Machine Translation Translation

Understanding the Impact of UGC Specificities on Translation Quality

no code implementations WNUT (ACL) 2021 José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT.

Translation

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

1 code implementation EACL 2021 Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah

Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.

Language Modelling Zero-Shot Cross-Lingual Transfer

Disentangling semantics in language through VAEs and a certain architectural choice

1 code implementation24 Dec 2020 Ghazi Felhi, Joseph Le Roux, Djamé Seddah

We present an unsupervised method to obtain disentangled representations of sentences that single out semantic content.

Open Information Extraction Sentence

On the Granularity of Explanations in Model Agnostic NLP Interpretability

1 code implementation24 Dec 2020 Yves Rychener, Xavier Renard, Djamé Seddah, Pascal Frossard, Marcin Detyniecki

Current methods for Black-Box NLP interpretability, like LIME or SHAP, are based on altering the text to interpret by removing words and modeling the Black-Box response.

Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

no code implementations3 Nov 2020 Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes

This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis.

Cannot find the paper you are looking for? You can Submit a new open access paper.