SaRoCo is a dataset for detecting satire in Romanian news containing 55,608 news articles from multiple real and satirical news sources, of which 27,980 are regular and 27,628 satirical news reports. We provide the data in csv format, in three files following the train/validation/test splits.
2 PAPERS • NO BENCHMARKS YET
GeoCoV19 is a large-scale Twitter dataset containing more than 524 million multilingual tweets. The dataset contains around 378K geotagged tweets and 5.4 million tweets with Place information. The annotations include toponyms from the user location field and tweet content and resolve them to geolocations such as country, state, or city level. In this case, 297 million tweets are annotated with geolocation using the user location field and 452 million tweets using tweet content.
3 PAPERS • NO BENCHMARKS YET