We provide all the expected data inputs to GUISS such as meshes, texture images, and blend files. Generated datasets used in our experiments along with the stereo depth estimations can be downloaded. We have defined seven dataset types: scene_reconstructions, texture_variation, gaea_texture_variation, generative_texture, terrain_variation, rocks, and generative_texture_snow. Each dataset type contains renderings with varying values of different parameters such as lighting angle, texture imgs, albedo, etc. Position each dataset type folder under data/dataset/.
1 PAPER • NO BENCHMARKS YET
An object-centric version of Stylized COCO to benchmark texture bias and out-of-distribution robustness of vision models. See the ECCV 22 paper and supplementary material for details.
A multimodal dataset of radio galaxies and their corresponding infrared hosts.
Automating the creation of catalogues for radio galaxies in next-generation deep surveys necessitates the identification of components within extended sources and their respective infrared hosts. We present RadioGalaxyNET, a multimodal dataset, tailored for machine learning tasks to streamline the automated detection and localization of multi-component extended radio galaxies and their associated infrared hosts. The dataset encompasses 4,155 instances of galaxies across 2,800 images, incorporating both radio and infrared channels. Each instance furnishes details about the extended radio galaxy class, a bounding box covering all components, a pixel-level segmentation mask, and the keypoint position of the corresponding infrared host galaxy. RadioGalaxyNET is the first dataset to include images from the highly sensitive Australian Square Kilometre Array Pathfinder (ASKAP) radio telescope, corresponding infrared images, and instance-level annotations for galaxy detection.
1 PAPER • 1 BENCHMARK
Overview The Surgical Instruments Recognition Dataset is a groundbreaking collection of high-resolution images (1280x960 pixels) specifically designed for the recognition and categorization of surgical instruments. This dataset captures the intricate details and complexity of surgical tools, particularly when arranged in scenarios reminiscent of an operating room.
TAMPAR is a real-world dataset of parcel photos for tampering detection with annotations in COCO format. For details see the paper and for visual samples the project page. Features are:
We present a large-scale dataset for 3D urban scene understanding. Compared to existing datasets, our dataset consists of 75 outdoor urban scenes with diverse backgrounds, encompassing over 15,000 images. These scenes offer 360◦ hemispherical views, capturing diverse foreground objects illuminated under various lighting conditions. Additionally, our dataset encompasses scenes that are not limited to forward-driving views, addressing the limitations of previous datasets such as limited overlap and coverage between camera views. The closest pre-existing dataset for generalizable evaluation is DTU [2] (80 scenes) which comprises mostly indoor objects and does not provide multiple foreground objects or background scenes.
3 PAPERS • 1 BENCHMARK
We introduce the trapped yeast cell (TYC) dataset, a novel dataset for understanding instance-level semantics and motions of cells in microstructures. We release $105$ dense annotated high-resolution brightfield microscopy images, including about $19$k instance masks. We also release $261$ curated video clips composed of $1293$ high-resolution microscopy images to facilitate unsupervised understanding of cell motions and morphology.
A Multi-Task 4D Radar-Camera Fusion Dataset for Autonomous Driving on Water Surfaces description of the dataset
8 PAPERS • 2 BENCHMARKS
This is the first general Underwater Image Instance Segmentation (UIIS) dataset containing 4,628 images for 7 categories with pixel-level annotations for underwater instance segmentation task
To reveal and systematically investigate the effectiveness of the proposed method in the real world, a real low-light image dataset for instance segmentation is necessary and urgently needed. Considering there is no suitable dataset, therefore, we collect and annotate a Low-light Instance Segmentation (LIS) dataset using a Canon EOS 5D Mark IV camera.
2 PAPERS • NO BENCHMARKS YET
Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.
An instance segmentation dataset of yeast cells in microstructures. The dataset includes 493 densely annotated microscopy images. For more information see the paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures".
Hi4D contains 4D textured scans of 20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D contains rich interaction centric annotations in 2D and 3D alongside accurately registered parametric body models.
7 PAPERS • NO BENCHMARKS YET
The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.
4 PAPERS • NO BENCHMARKS YET
The 'Me 163' was a Second World War fighter airplane and a result of the German air force secret developments. One of these airplanes is currently owned and displayed in the historic aircraft exhibition of the 'Deutsches Museum' in Munich, Germany. To gain insights with respect to its history, design and state of preservation, a complete CT scan was obtained using an industrial XXL-computer tomography scanner at Fraunhofer EZRT .
Microscopy images of shrub cross sections for instance segmentation of tree rings.
Satlas is a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and 7 label modalities.
5 PAPERS • NO BENCHMARKS YET
Chinese Character Stroke Extraction (CCSE) is a benchmark containing two large-scale datasets: Kaiti CCSE (CCSE-Kai) and Handwritten CCSE (CCSE-HW). It is designed for stroke extraction problems.
Open Images is a computer vision dataset covering ~9 million images with labels spanning thousands of object categories. A subset of 1.9M includes diverse annotations types.
3 PAPERS • NO BENCHMARKS YET
Occluded COCO is automatically generated subset of COCO val dataset, collecting partially occluded objects for a large variety of categories in real images in a scalable manner, where target object is partially occluded but the segmentation mask is connected.
2 PAPERS • 1 BENCHMARK
Real-world dataset of ~400 images of cuboid-shaped parcels with full 2D and 3D annotations in the COCO format.
Separated COCO is automatically generated subsets of COCO val dataset, collecting separated objects for a large variety of categories in real images in a scalable manner, where target object segmentation mask is separated into distinct regions by the occluder.
A set of 221 stereo videos captured by the SOCRATES stereo camera trap in a wildlife park in Bonn, Germany between February and July of 2022. A subset of frames is labeled with instance annotations in the COCO format.
DeepSportradar is a benchmark suite of computer vision tasks, datasets and benchmarks for automated sport understanding. DeepSportradar currently supports four challenging tasks related to basketball: ball 3D localization, camera calibration, player instance segmentation and player re-identification. For each of the four tasks, a detailed description of the dataset, objective, performance metrics, and the proposed baseline method are provided.
OmniCity is a dataset for omnipotent city understanding from multi-level and multi-view images. It contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City. This dataset introduces a new task of fine-grained building instance segmentation on street-level panorama images. It also provides new problem settings for existing tasks, such as cross-view image matching, synthesis, segmentation, detection, etc., and facilitates the developing of new methods for large-scale city understanding, reconstruction, and simulation.
VizWiz-FewShot is a a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.
MOTFront provides photo-realistic RGB-D images with their corresponding instance segmentation masks, class labels, 2D & 3D bounding boxes, 3D geometry, 3D poses and camera parameters. The MOTFront dataset comprises 2,381 unique indoor sequences with a total of 60,000 images and is based on the 3D-FRONT dataset.
The Instance Segmentation task, an extension of the well-known Object Detection task, is of great help in many areas, such as precision agriculture: being able to automatically identify plant organs and the possible diseases associated with them, allows to effectively scale and automate crop monitoring and its diseases control.
1 PAPER • 2 BENCHMARKS
The General Robust Image Task (GRIT) Benchmark is an evaluation-only benchmark for evaluating the performance and robustness of vision systems across multiple image prediction tasks, concepts, and data sources. GRIT hopes to encourage our research community to pursue the following research directions:
13 PAPERS • 8 BENCHMARKS
Our project (STPLS3D) aims to provide a large-scale aerial photogrammetry dataset with synthetic and real annotated 3D point clouds for semantic and instance segmentation tasks.
32 PAPERS • 3 BENCHMARKS
The TimberSeg 1.0 dataset is composed of 220 images showing wood logs in various environments and conditions in Canada. The images are densely annotated with segmentation masks for each log instance, as well as the corresponding bounding box and class label. This dataset aim towards enabling autonomous forestry forwarders, therefore it contains nearly 2500 instances of wood logs from an operators' point-of-view. Images were taken in the forest, near the roadside, in lumberyards and above timber-filled trailers. The logs were annotated considering a grasping perspective, meaning that only the logs above the piles and accessible are segmented.
Human fibrosarcoma HT1080WT (ATCC) cells at low cell densities embedded in 3D collagen type I matrices [1]. The time-lapse videos were recorded every 2 minutes for 16.7 hours and covered a field of view of 1002 pixels × 1004 pixels with a pixel size of 0.802 μm/pixel The videos were pre-processed to correct frame-to-frame drift artifacts, resulting in a final size of 983 pixels × 985 pixels pixels.
Minor Irrigation Structures Check-Dam Dataset is a public dataset annotated by domain experts using images from Google static map for instance segmentation and object detection tasks.
BPCIS is collection of 364 bacterial phase contrast images and corresponding label matrices for instance segmentation. Labels were made according to fluorescence channels where possible. Prior to manual annotation, images were automatically cropped into microcolonies and tiled into ensemble images to reduce the empty (non-cell) image regions for training and testing. Subsequent to annotation, we performed non-rigid registration of phase contrast to cell masks.
The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.
iShape is an irregular shape dataset for instance segmentation. iShape contains six sub-datasets with one real and five synthetics, each represents a scene of a typical irregular shape.
KITTI-360 is a large-scale dataset that contains rich sensory information and full annotations. It is the successor of the popular KITTI dataset, providing more comprehensive semantic/instance labels in 2D and 3D, richer 360 degree sensory information (fisheye images and pushbroom laser scans), very accurate and geo-localized vehicle and camera poses, and a series of new challenging benchmarks.
161 PAPERS • 6 BENCHMARKS
The LIVECell (Label-free In Vitro image Examples of Cells) dataset is a large-scale microscopic image dataset for instance-segmentation of individual cells in 2D cell cultures.
14 PAPERS • 1 BENCHMARK
This dataset were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. And acquired on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR). Images contains bean, with various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) The ground truth is defined for each images with polygons around leafs boundaries: In addition, each polygons are labeled into crop or weed. (2020-06-11)
0 PAPER • NO BENCHMARKS YET
This publicly available dataset contains 1613 RGB-D images of field-grown broccoli plants. The dataset also includes the polygon and circle annotations of the broccoli heads.
Video sequences captured at a field on Campus Kleinaltendorf (CKA), University of Bonn, captured by BonBot-I, an autonomous weeding robot. The data was captured by mounting an Intel RealSense D435i sensor with a nadir view of the ground.
Revision: v1.0.0-full-20210527a DOI: 10.5281/zenodo.4817662 Authors: J. Chazalon, E. Carlinet, Y. Chen, J. Perret, C. Mallet, B. Duménieu and T. Géraud Official competition website: https://icdar21-mapseg.github.io/
We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.
This data set contains 775 video sequences, captured in the wildlife park Lindenthal (Cologne, Germany) as part of the AMMOD project, using an Intel RealSense D435 stereo camera. In addition to color and infrared images, the D435 is able to infer the distance (or “depth”) to objects in the scene using stereo vision. Observed animals include various birds (at daytime) and mammals such as deer, goats, sheep, donkeys, and foxes (primarily at nighttime). A subset of 412 images is annotated with a total of 1038 individual animal annotations, including instance masks, bounding boxes, class labels, and corresponding track IDs to identify the same individual over the entire video.
CryoNuSeg is a fully annotated FS-derived cryosectioned and H&E-stained nuclei instance segmentation dataset. The dataset contains images from 10 human organs that were not exploited in other publicly available datasets, and is provided with three manual mark-ups to allow measuring intra-observer and inter-observer variability.
11 PAPERS • NO BENCHMARKS YET
Video sequences from a glasshouse environment in Campus Kleinaltendorf(CKA), University of Bonn, captured by PATHoBot, a glasshouse monitoring robot.
The dataset of Thermal Bridges on Building Rooftops (TBBR dataset) consists of annotated combined RGB and thermal drone images with a height map. All images were converted to a uniform format of 3000$\times$4000 pixels, aligned, and cropped to 2400$\times$3400 to remove empty borders.
2 PAPERS • 2 BENCHMARKS