Benchmarking
2220 papers with code • 2 benchmarks • 6 datasets
Most implemented papers
MMDetection: Open MMLab Detection Toolbox and Benchmark
In this paper, we introduce the various features of this toolbox.
Learning Transferable Visual Models From Natural Language Supervision
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70, 000 fashion products from 10 categories, with 7, 000 images per category.
CIDEr: Consensus-based Image Description Evaluation
We propose a novel paradigm for evaluating image descriptions that uses human consensus.
The StarCraft Multi-Agent Challenge
In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.
Benchmarking Deep Reinforcement Learning for Continuous Control
Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning.
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier's robustness to common perturbations.
Benchmarking Graph Neural Networks
In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library
An adversarial example library for constructing attacks, building defenses, and benchmarking both
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.