The WebVision dataset is designed to facilitate the research on learning visual representation from noisy web data. It is a large scale web images dataset that contains more than 2.4 million of images crawled from the Flickr website and Google Images search.
The same 1,000 concepts as the ILSVRC 2012 dataset are used for querying images, such that a bunch of existing approaches can be directly investigated and compared to the models trained from the ILSVRC 2012 dataset, and also makes it possible to study the dataset bias issue in the large scale scenario. The textual information accompanied with those images (e.g., caption, user tags, or description) are also provided as additional meta information. A validation set contains 50,000 images (50 images per category) is provided to facilitate the algorithmic development.