MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.
4 PAPERS • 3 BENCHMARKS
WikiText-TL-39 is a benchmark language modeling dataset in Filipino that has 39 million tokens in the training set.
3 PAPERS • NO BENCHMARKS YET
NewsPH-NLI is a sentence entailment benchmark dataset in the low-resource Filipino language.
2 PAPERS • NO BENCHMARKS YET
Expertly-curated benchmark dataset for fake news detection in Filipino.
1 PAPER • NO BENCHMARKS YET
Description: 500 Hours - Filipino Speech Data by Mobile Phone,the data were recorded by Filipino speakers with authentic Filipino accents.The text is manually proofread with high accuracy. Match mainstream Android, Apple system phones.
0 PAPER • NO BENCHMARKS YET