no code implementations • 16 Dec 2024 • Arij Riabi, Virginie Mouilleron, Menel Mahamdi, Wissam Antoun, Djamé Seddah
The proliferation of radical content on online platforms poses significant risks, including inciting violence and spreading extremist ideologies.
no code implementations • 13 Nov 2024 • Wissam Antoun, Francis Kulumba, Rian Touchent, Éric de la Clergerie, Benoît Sagot, Djamé Seddah
In this paper, we introduce two new versions of the CamemBERT base model-CamemBERTav2 and CamemBERTv2-designed to address these challenges.
no code implementations • 30 Jul 2024 • Francis Kulumba, Wissam Antoun, Guillaume Vimont, Laurent Romary
HAL (Hyper Articles en Ligne) is the French national publication repository, used by most higher education and research organizations for their open science policy.
no code implementations • 23 Sep 2023 • Wissam Antoun, Benoît Sagot, Djamé Seddah
The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection.
no code implementations • 9 Jun 2023 • Wissam Antoun, Virginie Mouilleron, Benoît Sagot, Djamé Seddah
This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes.
no code implementations • 2 Jun 2023 • Wissam Antoun, Benoît Sagot, Djamé Seddah
In this paper, we introduce CamemBERTa, a French DeBERTa model that builds upon the DeBERTaV3 architecture and training objective.
1 code implementation • EACL (WANLP) 2021 • Tarek Naous, Wissam Antoun, Reem A. Mahmoud, Hazem Hajj
The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents.
1 code implementation • EACL (WANLP) 2021 • Wissam Antoun, Fady Baly, Hazem Hajj
In this paper, we develop the first advanced Arabic language generation model, AraGPT2, trained from scratch on a large Arabic corpus of internet text and news articles.
1 code implementation • EACL (WANLP) 2021 • Wissam Antoun, Fady Baly, Hazem Hajj
Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA).
no code implementations • LREC 2020 • Dj, Marc ji, Fady Baly, Wissam Antoun, Hazem Hajj
The shared task on Offensive Language Detection at the OSACT4 has aimed at achieving state of art profane language detection methods for Arabic social media.
3 code implementations • LREC 2020 • Wissam Antoun, Fady Baly, Hazem Hajj
Recently, with the surge of transformers based models, language-specific BERT based models have proven to be very efficient at language understanding, provided they are pre-trained on a very large corpus.
Ranked #1 on Sentiment Analysis on AJGT
1 code implementation • WS 2019 • Obeida ElJundi, Wissam Antoun, Nour El Droubi, Hazem Hajj, Wassim El-Hajj, Khaled Shaban
Experiment results show that the developed hULMonA and multi-lingual ULM are able to generalize well to multiple Arabic data sets and achieve new state of the art results in Arabic Sentiment Analysis for some of the tested sets.