Images

MuMiN

Introduced by Nielsen et al. in MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.

MuMiN fills a gap in the existing misinformation datasets in multiple ways:

By having a large amount of social media information which have been semantically linked to fact-checked claims on an individual basis.
By featuring 41 languages, enabling evaluation of multilingual misinformation detection models.
By featuring both tweets, articles, images, social connections and hashtags, enabling multimodal approaches to misinformation detection.

MuMiN features two node classification tasks, related to the veracity of a claim:

Claim classification: Determine the veracity of a claim, given its social network context.
Tweet classification: Determine the likelihood that a social media post to be fact-checked is discussing a misleading claim, given its social network context.

To use the dataset, see the "Getting Started" guide and tutorial at the MuMiN website.

Homepage