no code implementations • EMNLP (ALW) 2020 • Ilan Price, Jordan Gifford-Moore, Jory Flemming, Saul Musker, Maayan Roichman, Guillaume Sylvain, Nithum Thain, Lucas Dixon, Jeffrey Sorensen
We present a new dataset of approximately 44000 comments labeled by crowdworkers.
1 code implementation • ACL 2022 • John Pavlopoulos, Leo Laugier, Alexandros Xenos, Jeffrey Sorensen, Ion Androutsopoulos
We study the task of toxic spans detection, which concerns the detection of the spans that make a text toxic, when detecting such spans is possible.
no code implementations • NAACL (WOAH) 2022 • Alyssa Chvasta, Alyssa Lees, Jeffrey Sorensen, Lucy Vasserman, Nitesh Goyal
In an era of increasingly large pre-trained language models, knowledge distillation is a powerful tool for transferring information from a large model to a smaller one.
no code implementations • SemEval (NAACL) 2022 • Elisabetta Fersini, Francesca Gasparini, Giulia Rizzi, Aurora Saibene, Berta Chulvi, Paolo Rosso, Alyssa Lees, Jeffrey Sorensen
The paper describes the SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification (MAMI), which explores the detection of misogynous memes on the web by taking advantage of available texts and images.
no code implementations • 22 Feb 2022 • Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, Lucy Vasserman
As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles.
no code implementations • 19 Nov 2021 • Alexandros Xenos, John Pavlopoulos, Ion Androutsopoulos, Lucas Dixon, Jeffrey Sorensen, Leo Laugier
User posts whose perceived toxicity depends on the conversational context are rare in current toxicity detection datasets.
no code implementations • SEMEVAL 2021 • John Pavlopoulos, Jeffrey Sorensen, L{\'e}o Laugier, Ion Androutsopoulos
For the supervised sequence labeling approach and evaluation purposes, posts previously labeled as toxic were crowd-annotated for toxic spans.
1 code implementation • EACL 2021 • Leo Laugier, John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon
Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts.
1 code implementation • 14 Oct 2020 • Ilan Price, Jordan Gifford-Moore, Jory Flemming, Saul Musker, Maayan Roichman, Guillaume Sylvain, Nithum Thain, Lucas Dixon, Jeffrey Sorensen
We present a new dataset of approximately 44000 comments labeled by crowdworkers.
1 code implementation • ACL 2020 • John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos
Moderation is crucial to promoting healthy on-line discussions.
2 code implementations • 11 Apr 2020 • Varada Kolhatkar, Nithum Thain, Jeffrey Sorensen, Lucas Dixon, Maite Taboada
The quality of the annotation scheme and the resulting dataset is evaluated using measurements of inter-annotator agreement, expert assessment of a sample, and by the constructiveness sub-characteristics, which we show provide a proxy for the general constructiveness concept.
4 code implementations • 11 Mar 2019 • Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman
Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large.
no code implementations • 5 Mar 2019 • Daniel Borkan, Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman
This report examines the Pinned AUC metric introduced and highlights some of its limitations.