Search Results for author: Tarek Naous

Found 11 papers, 7 papers with code

On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena

1 code implementation8 Jan 2025 Tarek Naous, Wei Xu

We introduce CAMeL-2, a parallel Arabic-English benchmark of 58, 086 entities associated with Arab and Western cultures and 367 masked natural contexts for entities.

Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI

no code implementations19 Dec 2024 Isadora Krsek, Anubha Kabra, Yao Dou, Tarek Naous, Laura A. Dabbish, Alan Ritter, Wei Xu, Sauvik Das

In pseudonymous online fora like Reddit, the benefits of self-disclosure are often apparent to users (e. g., I can vent about my in-laws to understanding strangers), but the privacy risks are more abstract (e. g., will my partner be able to tell that this is me?).

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

no code implementations6 Feb 2024 Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu

The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation.

Misinformation Stance Classification +1

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

1 code implementation25 May 2023 Michael J. Ryan, Tarek Naous, Wei Xu

However, less work has been done on multilingual text simplification due to the lack of a diverse evaluation benchmark that covers complex-simple sentence pairs in many languages.

Sentence Text Simplification +1

Having Beer after Prayer? Measuring Cultural Bias in Large Language Models

1 code implementation23 May 2023 Tarek Naous, Michael J. Ryan, Alan Ritter, Wei Xu

In this paper, we show that multilingual and Arabic monolingual LMs exhibit bias towards entities associated with Western culture.

named-entity-recognition Named Entity Recognition +4

Stanceosaurus: Classifying Stance Towards Multilingual Misinformation

no code implementations28 Oct 2022 Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter

We present Stanceosaurus, a new corpus of 28, 033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims.

Domain Adaptation Fact Checking +1

Clustering Plotted Data by Image Segmentation

1 code implementation CVPR 2022 Tarek Naous, Srinjay Sarkar, Abubakar Abid, James Zou

We describe the method and compare it to ten other clustering methods on synthetic data to illustrate its advantages and disadvantages.

Clustering Image Segmentation +3

Empathetic BERT2BERT Conversational Model: Learning Arabic Language Generation with Little Data

1 code implementation EACL (WANLP) 2021 Tarek Naous, Wissam Antoun, Reem A. Mahmoud, Hazem Hajj

The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents.

Decoder Empathetic Response Generation +4

Cannot find the paper you are looking for? You can Submit a new open access paper.