1 code implementation • COLING (WANLP) 2020 • Tarek Naous, Christian Hokayem, Hazem Hajj
However, the dataset is not large enough to develop very complex encoder-decoder models.
no code implementations • 6 Feb 2024 • Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu
The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation.
no code implementations • 16 Nov 2023 • Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, Wei Xu
Motivated by the user feedback, we introduce the task of self-disclosure abstraction, which is paraphrasing disclosures into less specific terms while preserving their utility, e. g., "Im 16F" to "I'm a teenage girl".
1 code implementation • 25 May 2023 • Michael J. Ryan, Tarek Naous, Wei Xu
However, less work has been done on multilingual text simplification due to the lack of a diverse evaluation benchmark that covers complex-simple sentence pairs in many languages.
Ranked #1 on Text Simplification on WikiLargeFR
1 code implementation • 23 May 2023 • Tarek Naous, Michael J. Ryan, Anton Lavrouk, Mohit Chandra, Wei Xu
We present a systematic study and comprehensive evaluation of large language models for automatic multilingual readability assessment.
no code implementations • 23 May 2023 • Tarek Naous, Michael J. Ryan, Alan Ritter, Wei Xu
In this paper, we show that multilingual and Arabic monolingual LMs exhibit bias towards entities associated with Western culture.
no code implementations • 28 Oct 2022 • Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter
We present Stanceosaurus, a new corpus of 28, 033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims.
1 code implementation • CVPR 2022 • Tarek Naous, Srinjay Sarkar, Abubakar Abid, James Zou
We describe the method and compare it to ten other clustering methods on synthetic data to illustrate its advantages and disadvantages.
1 code implementation • EACL (WANLP) 2021 • Tarek Naous, Wissam Antoun, Reem A. Mahmoud, Hazem Hajj
The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents.
Empathetic Response Generation Natural Language Understanding +3