The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
6,965 PAPERS • 52 BENCHMARKS
The ListOps examples are comprised of summary operations on lists of single digit integers, written in prefix notation. The full sequence has a corresponding solution which is also a single-digit integer, thus making it a ten-way balanced classification problem. For example, [MAX 2 9 [MIN 4 7 ] 0 ] has the solution 9. Each operation has a corresponding closing square bracket that defines the list of numbers for the operation. In this example, MIN operates on {4, 7}, while MAX operates on {2, 9, 4, 0}.
78 PAPERS • NO BENCHMARKS YET
A large and realistic natural language question answering dataset.
60 PAPERS • 1 BENCHMARK
SCIREX is a document level IE dataset that encompasses multiple IE tasks, including salient entity identification and document level N-ary relation identification from scientific articles. The dataset is annotated by integrating automatic and human annotations, leveraging existing scientific knowledge resources.
32 PAPERS • 2 BENCHMARKS
We propose an efficient high-throughput scheme for the discovery of stable crystalline phases. Our approach is based on the transmutation of known compounds, through the substitution of atoms in the crystal structure with chemically similar ones. The concept of similarity is defined quantitatively using a measure of chemical replaceability, extracted by data-mining experimental databases. In this way we build 189,981 possible crystal phases, including 18,479 that are on the convex hull of stability. The resulting success rate of 9.72% is at least one order of magnitude better than the usual success rate of systematic high-throughput calculations for a specific family of materials, and comparable with speed-up factors of machine learning filtering procedures. As a characterization of the set of 18,479 stable compounds, we calculate their electronic band gaps, magnetic moments, and hardness. Our approach, that can be used as a filter on top of any high-throughput scheme, enables us to ef
1 PAPER • NO BENCHMARKS YET
WIKIOG is a public collection which consists of over 1.75 million document-outline pairs for research on the OG task.