The satire dataset is a new multi-modal dataset of satirical and regular news articles. The satirical news is collected from four websites that explicitly declare themselves to be satire, and the regular news is collected from six mainstream news websites. Specifically, the satirical news websites the articles were collected from are The Babylon Bee, Clickhole, Waterford Whisper News, and The DailyER. The regular news websites are Reuters, The Hill, Politico, New York Post, Huffington Post, and Vice News. The headlines and the thumbnail images of the latest 1000 articles for each of the publications are collected. The dataset contains a total of 4000 satirical and 6000 regular news articles.
1 PAPER • NO BENCHMARKS YET
The Oxford-BBC Lip Reading Sentences 2 (LRS2) dataset is one of the largest publicly available datasets for lip reading sentences in-the-wild. The database consists of mainly news and talk shows from BBC programs. Each sentence is up to 100 characters in length. The training, validation and test sets are divided according to broadcast date. It is a challenging set since it contains thousands of speakers without speaker labels and large variation in head pose. The pre-training set contains 96,318 utterances, the training set contains 45,839 utterances, the validation set contains 1,082 utterances and the test set contains 1,242 utterances.
96 PAPERS • 9 BENCHMARKS
CASIA V2 is a dataset for forgery classification. It contains 4795 images, 1701 authentic and 3274 forged.
12 PAPERS • NO BENCHMARKS YET
CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has segmentation mask of facial attributes corresponding to CelebA.
141 PAPERS • 4 BENCHMARKS