In 2020 The Workshop on Online Abuse and Harms (WOAH) held a satellite panel at RightsCons 2020, an international human rights conference.
We present the results and main findings of the shared task at WOAH 5 on hateful memes detection.
We investigate the use of machine learning classifiers for detecting online abuse in empirical research.
We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2, 400).
While some of the models do not give a single unsafe response, most give unsafe responses to more than 20% of the prompts, with over 50% unsafe responses in the extreme.
Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs).
In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers.
In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours in a systematic way.
Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing.
Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm.
To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models.
We show that it is crucial to account for the influencer-level structure, and find evidence of the importance of both influencer- and content-level factors, including the number of followers each influencer has, the type of content (original posts, quotes and replies), the length and toxicity of content, and whether influencers request retweets.
We show that both text- and visual- enrichment improves model performance, with the multimodal model (0. 771) outperforming other models' F1 scores (0. 544, 0. 737, and 0. 754).
no code implementations • • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.
Online misogyny is a pernicious social problem that risks making online platforms toxic and unwelcoming to women.
Yet, most research in online hate detection to date has focused on hateful content.
We provide a new dataset of ~40, 000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation.
The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic.
Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies.
Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict.
Social and Information Networks Computers and Society Physics and Society Applications
Islamophobic hate speech on social media inflicts considerable harm on both targeted individuals and wider society, and also risks reputational damage for the host platforms.