InfoTabS comprises of human-written textual hypotheses based on premises that are tables extracted from Wikipedia info-boxes.
13 PAPERS • NO BENCHMARKS YET
Logo-2K+:A Large-Scale Logo Dataset for Scalable Logo Classification The Logo-2K+ dataset contains a diverse range of logo classes from real-world logo images. It contains 167,140 images with 10 root categories and 2,341 leaf categories. The 10 different root categories are: Food, Clothes, Institution, Accessories, Transportation, Electronic, Necessities, Cosmetic, Leisure and Medical.
11 PAPERS • NO BENCHMARKS YET
MED is a new evaluation dataset that covers a wide range of monotonicity reasoning that was created by crowdsourcing and collected from linguistics publications. The dataset was constructed by collecting naturally-occurring examples by crowdsourcing and well-designed ones from linguistics publications. It consists of 5,382 examples.
18 PAPERS • 1 BENCHMARK
MVTec D2S is a benchmark for instance-aware semantic segmentation in an industrial domain. It contains 21,000 high-resolution images with pixel-wise labels of all object instances. The objects comprise groceries and everyday products from 60 categories. The benchmark is designed such that it resembles the real-world setting of an automatic checkout, inventory, or warehouse system. The training images only contain objects of a single class on a homogeneous background, while the validation and test sets are much more complex and diverse.
2 PAPERS • NO BENCHMARKS YET
MathQA significantly enhances the AQuA dataset with fully-specified operational programs.
94 PAPERS • 1 BENCHMARK
MeQSum is a dataset for medical question summarization. It contains 1,000 summarized consumer health questions.
25 PAPERS • 1 BENCHMARK
MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.
23 PAPERS • NO BENCHMARKS YET
MedleyDB 2.0 is a superset of the MedleyDB – a dataset of annotated, royalty-free multitrack recordings. The second iteration of the dataset includes 74 new multitrack recordings resulting in 194 songs in total.
0 PAPER • NO BENCHMARKS YET
ORCAS is a click-based dataset. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries.
15 PAPERS • NO BENCHMARKS YET
RPC is a large-scale retail product checkout dataset and collects 200 retail SKUs. The collected SKUs can be divided into 17 meta categories, i.e., puffed food, dried fruit, dried food, instant drink, instant noodles, dessert, drink, alcohol, milk, canned food, chocolate, gum, candy, seasoner, personal hygiene, tissue, stationery.
RadioTalk is a corpus of speech recognition transcripts sampled from talk radio broadcasts in the United States between October of 2018 and March of 2019. The corpus is intended for use by researchers in the fields of natural language processing, conversational analysis, and the social sciences. The corpus encompasses approximately 2.8 billion words of automatically transcribed speech from 284,000 hours of radio, together with metadata about the speech, such as geographical location, speaker turn boundaries, gender, and radio program information.
1 PAPER • NO BENCHMARKS YET
Includes considerable roll and pitch camera motion.
VehicleX is a large-scale synthetic dataset. Created in Unity, it contains 1,362 vehicles of various 3D models with fully editable attributes.
16 PAPERS • NO BENCHMARKS YET
The Vocal Folds dataset is a dataset for automatic segmentation of laryngeal endoscopic images. The dataset consists of 8 sequences from 2 patients containing 536 hand segmented in vivo colour images of the larynx during two different resection interventions with a resolution of 512x512 pixels.
3 PAPERS • NO BENCHMARKS YET
Created from endoscopic video feeds of real-world surgical procedures. Overall, the data consists of 307 images, each of which is annotated for the organs and different surgical instruments present in the scene.