The Hateful Memes data set is a multimodal dataset for hateful meme detection (image + text) that contains 10,000+ new multimodal examples created by Facebook AI. Images were licensed from Getty Images so that researchers can use the data set to support their work.
144 PAPERS • 1 BENCHMARK
MEIR is a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr.
8 PAPERS • NO BENCHMARKS YET
MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.
4 PAPERS • 3 BENCHMARKS
Detecting out-of-context media, such as "mis-captioned" images on Twitter, is a relevant problem, especially in domains of high public significance. Twitter-COMMs is a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles. This dataset can be used to develop methods to detect misinformation on social media platforms related to these three topics.
2 PAPERS • NO BENCHMARKS YET
MMVax-Stance includes 113 Vaccine Hesitancy Framings found on Twitter about the COVID-19 vaccines. Language experts annotated multimodal image-text tweets as Relevant or Not Relevant, and then further annotated Relevant tweets with Stance towards each framing.
1 PAPER • NO BENCHMARKS YET