1 code implementation • COLING (WANLP) 2020 • Tarek Naous, Christian Hokayem, Hazem Hajj
However, the dataset is not large enough to develop very complex encoder-decoder models.
1 code implementation • 8 Jan 2025 • Tarek Naous, Wei Xu
We introduce CAMeL-2, a parallel Arabic-English benchmark of 58, 086 entities associated with Arab and Western cultures and 367 masked natural contexts for entities.
no code implementations • 19 Dec 2024 • Isadora Krsek, Anubha Kabra, Yao Dou, Tarek Naous, Laura A. Dabbish, Alan Ritter, Wei Xu, Sauvik Das
In pseudonymous online fora like Reddit, the benefits of self-disclosure are often apparent to users (e. g., I can vent about my in-laws to understanding strangers), but the privacy risks are more abstract (e. g., will my partner be able to tell that this is me?).
no code implementations • 6 Feb 2024 • Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu
The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation.
no code implementations • 16 Nov 2023 • Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, Wei Xu
Self-disclosure, while being common and rewarding in social media interaction, also poses privacy risks.
1 code implementation • 25 May 2023 • Michael J. Ryan, Tarek Naous, Wei Xu
However, less work has been done on multilingual text simplification due to the lack of a diverse evaluation benchmark that covers complex-simple sentence pairs in many languages.
Ranked #1 on
Text Simplification
on WikiLargeFR
1 code implementation • 23 May 2023 • Tarek Naous, Michael J. Ryan, Anton Lavrouk, Mohit Chandra, Wei Xu
We present a comprehensive evaluation of large language models for multilingual readability assessment.
1 code implementation • 23 May 2023 • Tarek Naous, Michael J. Ryan, Alan Ritter, Wei Xu
In this paper, we show that multilingual and Arabic monolingual LMs exhibit bias towards entities associated with Western culture.
no code implementations • 28 Oct 2022 • Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter
We present Stanceosaurus, a new corpus of 28, 033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims.
1 code implementation • CVPR 2022 • Tarek Naous, Srinjay Sarkar, Abubakar Abid, James Zou
We describe the method and compare it to ten other clustering methods on synthetic data to illustrate its advantages and disadvantages.
1 code implementation • EACL (WANLP) 2021 • Tarek Naous, Wissam Antoun, Reem A. Mahmoud, Hazem Hajj
The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents.