ShipRSImageNet is a large-scale fine-grainted dataset for ship detection in high-resolution optical remote sensing images. The dataset contains 3,435 images from various sensors, satellite platforms, locations, and seasons. Each image is around 930×930 pixels and contains ships with different scales, orientations, and aspect ratios. The images are annotated by experts in satellite image interpretation, categorized into 50 object categories images. The fully annotated ShipRSImageNet contains 17,573 ship instances. There are five critical contributions of the proposed ShipRSImageNet dataset compared with other existing remote sensing image datasets. Images are collected from various remote sensors cover- ing multiple ports worldwide and have large variations in size, spatial resolution, image quality, orientation, and environment. Ships are hierarchically classified into four levels and 50 ship categories. The number of images, ship instances, and ship cate- gories is larger than that in
0 PAPER • NO BENCHMARKS YET
Comprises of 171,191 video segments from 346 high-quality soccer games. The database contains 702,096 bounding boxes, 37,709 essential event labels with time boundary and 17,115 highlight annotations for object detection, action recognition, temporal action localization, and highlight detection tasks.
5 PAPERS • NO BENCHMARKS YET
An open source Multi-View Overhead Imagery dataset with 27 unique looks from a broad range of viewing angles (-32.5 degrees to 54.0 degrees). Each of these images cover the same 665 square km geographic extent and are annotated with 126,747 building footprint labels, enabling direct assessment of the impact of viewpoint perturbation on model performance.
3 PAPERS • NO BENCHMARKS YET
This dataset is an extremely challenging set of over 3000+ originally Stair images captured and crowdsourced from over 500+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
A new dataset for streaming classification consisting of temporally correlated images from 51 distinct object categories and additional evaluation classes outside of the training distribution to test novelty recognition.
A real-world image dataset that contains more than 900 images generated from 26 street cameras and 7 object categories annotated with detailed bounding box. The data distribution is non-IID and unbalanced, reflecting the characteristic real-world federated learning scenarios.
1 PAPER • NO BENCHMARKS YET
This dataset is an extremely challenging set of over 7000+ original Suitcase/Luggage images captured and crowdsourced from over 800+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at ****DC Labs.
Synscapes is a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis.
43 PAPERS • 1 BENCHMARK
TJU-DHD is a high-resolution dataset for object detection and pedestrian detection. The dataset contains 115,354 high-resolution images (52% images have a resolution of 1624×1200 pixels and 48% images have a resolution of at least 2,560×1,440 pixels) and 709,330 labelled objects in total with a large variance in scale and appearance.
11 PAPERS • 2 BENCHMARKS
TTPLA is a public dataset which is a collection of aerial images on Transmission Towers (TTs) and Power Lines (PLs). It can be used for detection and segmentation of transmission towers and power lines. It consists of 1,100 images with the resolution of 3,840×2,160 pixels, as well as manually labelled 8,987 instances of TTs and PLs.
7 PAPERS • NO BENCHMARKS YET
TinyPerson is a benchmark for tiny object detection in a long distance and with massive backgrounds. The images in TinyPerson are collected from the Internet. First, videos with a high resolution are collected from different websites. Second, images from the video are sampled every 50 frames. Then images with a certain repetition (homogeneity) are deleted, and the resulting images are annotated with 72,651 objects with bounding boxes by hand.
19 PAPERS • NO BENCHMARKS YET
This dataset, commissioned by the Yandex Business Directory, contains 10,000 photos of organization information signs shot in the Russian Federation along with the INN (taxpayer ID) and OGRN (Primary State Registration Number) codes shown on these signs. Toloka was used for both capturing photos and recognizing INN and OGRN codes.
This datase, contains 1244 images of hot and cold water meters as well as their readings and coordinates of the displays showing those readings. Each image contains exactly one water meter. The archive also includes the pictures of the results of segmentation with the masks and collages. Toloka was used for photo capturing, segmentation, and recognizing the readings.
Consists of 100 challenging video sequences captured from real-world traffic scenes (over 140,000 frames with rich annotations, including occlusion, weather, vehicle category, truncation, and vehicle bounding boxes) for object detection, object tracking and MOT system.
50 PAPERS • 1 BENCHMARK
Unconstrained Face Detection Dataset (UFDD) aims to fuel further research in unconstrained face detection.
13 PAPERS • NO BENCHMARKS YET
Includes 950 real-world underwater images, 890 of which have the corresponding reference images.
58 PAPERS • 1 BENCHMARK
UVO is a new benchmark for open-world class-agnostic object segmentation in videos. Besides shifting the problem focus to the open-world setup, UVO is significantly larger, providing approximately 8 times more videos compared with DAVIS, and 7 times more mask (instance) annotations per video compared with YouTube-VOS and YouTube-VIS. UVO is also more challenging as it includes many videos with crowded scenes and complex background motions. Some highlights of the dataset include:
23 PAPERS • 3 BENCHMARKS
VEDAI is a dataset for Vehicle Detection in Aerial Imagery, provided as a tool to benchmark automatic target recognition algorithms in unconstrained environments. The vehicles contained in the database, in addition of being small, exhibit different variabilities such as multiple orientations, lighting/shadowing changes, specularities or occlusions. Furthermore, each image is available in several spectral bands and resolutions. A precise experimental protocol is also given, ensuring that the experimental results obtained by different people can be properly reproduced and compared. We also give the performance of some baseline algorithms on this dataset, for different settings of these algorithms, to illustrate the difficulties of the task and provide baseline comparisons.
5 PAPERS • 1 BENCHMARK
Collects 60 reference sequences and 540 impaired sequences.
Includes 5000 spatially aligned RGBT image pairs with ground truth annotations. VT5000 has 11 challenges collected in different scenes and environments for exploring the robustness of algorithms.
23 PAPERS • NO BENCHMARKS YET
The Waymo Open Dataset is comprised of high resolution sensor data collected by autonomous vehicles operated by the Waymo Driver in a wide variety of conditions.
383 PAPERS • 12 BENCHMARKS
WiderPerson contains a total of 13,382 images with 399,786 annotations, i.e., 29.87 annotations per image, which means this dataset contains dense pedestrians with various kinds of occlusions. Hence, pedestrians in the proposed dataset are extremely challenging due to large variations in the scenario and occlusion, which is suitable to evaluate pedestrian detectors in the wild.
9 PAPERS • 1 BENCHMARK
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of its prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. WoodScape is an extensive fisheye automotive dataset named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images.
49 PAPERS • 1 BENCHMARK
YouTube-BoundingBoxes (YT-BB) is a large-scale data set of video URLs with densely-sampled object bounding box annotations. The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the MS COCO label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second.
7 PAPERS • 1 BENCHMARK
Functional Map of the World (fMoW) is a dataset that aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features.
113 PAPERS • NO BENCHMARKS YET
iSAID contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The images of iSAID is the same as the DOTA-v1.0 dataset, which are manily collected from the Google Earth, some are taken by satellite JL-1, the others are taken by satellite GF-2 of the China Centre for Resources Satellite Data and Application.
60 PAPERS • 3 BENCHMARKS