For the Drone-vs-Bird Detection Challenge 2021, 77 different video sequences have been made available as training data. These video sequences originate from the previous installment of the challenge and were collected using MPEG4-coded static cameras by the SafeShore project, by the Fraunhofer IOSB research institute and by the ALADDIN2 project. On average, the video sequences consist of 1,384 frames, while each frame contains 1.12 annotated drones. The video sequences are recorded with both static cameras and moving cameras and the resolution varies between 720×576 and 3840×2160 pixels. In total, 8 different types of drones exist in the dataset , i.e. 3 with fixed wings and 5 rotary ones. For each video, a separate annotation file is provided, which contains the frame number and the bounding box (expressed as [topx topy width height]) for the frames in which drones enter the scenes.
1 PAPER • 1 BENCHMARK
The EXPO-HD Dataset is a dataset of Expo whiteboard markers for the purpose of instance segmentation. The dataset contains two subsets (both include instances segmentation labels):
1 PAPER • NO BENCHMARKS YET
This is a detailed description of the dataset, a data sheet for the dataset as proposed by Gebru et al.
Fine-Grained Vehicle Detection (FGVD) is a dataset for fine-grained vehicle detection captured from a moving camera mounted on a car. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions.
FSVOD-500 is a large-scale video dataset comprising of 500 classes with class-balanced videos in each category for few-shot learning. FSVOD-500 is the first benchmark specially designed for few-shot video object detection for evaluating the performance of a given model on novel classes.
FractureAtlas is a musculoskeletal bone fracture dataset with annotations for deep learning tasks like classification, localization, and segmentation. The dataset contains a total of 4,083 X-Ray images with annotation in COCO, VGG, YOLO, and Pascal VOC format. This dataset is made freely available for any purpose. The data provided within this work are free to copy, share or redistribute in any medium or format. The data might be adapted, remixed, transformed, and built upon. The dataset is licensed under a CC-BY 4.0 license. It should be noted that to use the dataset correctly, one needs to have knowledge of medical and radiology fields to understand the results and make conclusions based on the dataset. It's also important to consider the possibility of labeling errors.
The GDIT Aerial Airport dataset consists of aerial images containing instances of parked airplanes. All plane types have been grouped into a single classification named "airplane".
Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD – an assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view and multi-modality videos, 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance and the further reasoning steps for comprehending knowledge in assembly progress, process effici
Hands Guns and Phones (HGP) dataset contains 2199 images (1989 for training an 210 for testing) of people using guns or phones in real-world scenarios (people making phones reviews, shooting drills, or making calls). Every image of this dataset is labeled with the bounding boxes of Hands, Phones and Guns. All the aforementioned images were collected from Youtube videos and have different sizes.
The Instance Segmentation task, an extension of the well-known Object Detection task, is of great help in many areas, such as precision agriculture: being able to automatically identify plant organs and the possible diseases associated with them, allows to effectively scale and automate crop monitoring and its diseases control.
1 PAPER • 2 BENCHMARKS
This data set contains 775 video sequences, captured in the wildlife park Lindenthal (Cologne, Germany) as part of the AMMOD project, using an Intel RealSense D435 stereo camera. In addition to color and infrared images, the D435 is able to infer the distance (or “depth”) to objects in the scene using stereo vision. Observed animals include various birds (at daytime) and mammals such as deer, goats, sheep, donkeys, and foxes (primarily at nighttime). A subset of 412 images is annotated with a total of 1038 individual animal annotations, including instance masks, bounding boxes, class labels, and corresponding track IDs to identify the same individual over the entire video.
Loucount is a retail object detection and and counting dataset with rich annotations in retail stores, which consists of 50, 394 images with more than 1.9 million object instances in 140 categories
METU-ALET is an image dataset for the detection of the tools in the wild. The dataset has annotations for tools that belongs to the categories such as farming, gardening, office, stonemasonry, vehicle, woodworking and workshop. The images in the dataset contains a total of 22,841 bounding boxes and 49 different tool categories.
Minor Irrigation Structures Check-Dam Dataset is a public dataset annotated by domain experts using images from Google static map for instance segmentation and object detection tasks.
MlGesture is a dataset for hand gesture recognition tasks, recorded in a car with 5 different sensor types at two different viewpoints. The dataset contains over 1300 hand gesture videos from 24 participants and features 9 different hand gesture symbols. One sensor cluster with five different cameras is mounted in front of the driver in the center of the dashboard. A second sensor cluster is mounted on the ceiling looking straight down.
Marine Microalgae Detection in Microscopy Images dataset contains a total number of images in the dataset is 937 and all the objects in these images were annotated. The total number of annotated objects is 4201. The training set contains 537 images and the testing set contains 430 images.
MuCeD, a dataset that is carefully curated and validated by expert pathologists from the All India Institute of Medical Science (AIIMS), Delhi, India. The H&E-stained histopathology images of the human duodenum in MuCeD are captured through an Olympus BX50 microscope at 20x zoom using a DP26 camera with each image being 1920x2148 in dimension. The dataset has 55 images, with bounding boxes for 2,090 IELs and 6,518 ENs annotated using the LabelMe software and are further validated by multiple pathologists. These cells are selected from the epithelial area -- a region of interest that has been explicitly segmented by experts. The epithelial area denotes the area of continuous villi and is used for cell detection, whereas rest of the area is masked out. Further, each image is sliced into 9 subimages and each subimage is re-scaled to 640x640, before it is given as input to object detection models. We divide 55 images into five folds of 11 images each and report 5-fold crossvalidation num
Natural Adversarial Objects (NAO) is a new dataset to evaluate the robustness of object detection models. NAO contains 7,934 images and 9,943 objects that are unmodified and representative of real-world scenarios, but cause state-of-the-art detection models to misclassify with high confidence.
This is a high-quality large-scale Night Object Detection (NOD) dataset of outdoor images targeting low-light object detection. The dataset contains more than 7K images and 46K annotated objects (with bounding boxes) that belong to classes: person, bicycle, and car. The photos were taken on the streets at evening hours, and thus all images present low-light conditions to a varying degree of severity.
An object-centric version of Stylized COCO to benchmark texture bias and out-of-distribution robustness of vision models. See the ECCV 22 paper and supplementary material for details.
Object detection dataset featuring people walking on grass captured aboard a UAV. This data sets include precise meta data information about altitude, viewing angle and others.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
SeaDronesSee-Object Detection v2 (S-ODv2) dataset contains 14,227 RGB images (training: 8,930; validation: 1,547; testing: 3,750). The images are captured from various altitudes and viewing angles ranging from 5 to 260 meters and 0 to 90° degrees (gimbal pitch angle) while providing the respective meta information for altitude, viewing angle and other meta data for almost all frames.
SIDOD is a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects (chosen randomly from the 21 object models of the YCB dataset) and flying distractors.
This is a dataset to benchmark real-time embedded object detection models for RoboCup SSL (Small Size League).
STN PLAD is a high-resolution and real-world image dataset of multiple high-voltage power line components. It has 2,409 annotated objects divided into five classes: transmission tower, insulator, spacer, tower plate, and Stockbridge damper, which vary in size (resolution), orientation, illumination, angulation, and background.
TAMPAR is a real-world dataset of parcel photos for tampering detection with annotations in COCO format. For details see the paper and for visual samples the project page. Features are:
The UAVVaste dataset consists to date of 772 images and 3716 annotations. The main motivation for creation of the dataset was the lack of domain-specific data. The datasets that are widely used for object detection evaluation benchmarking. The dataset is made publicly available and is intended to be expanded.
UDA-CH contains 16 objects that cover a variety of artworks which can be found in a museum like sculptures, paintings and books. Specifically, the dataset has been collected inside the cultural site “Galleria Regionale di Palazzo Bellomo” located in Siracusa, Italy.
This is the first general Underwater Image Instance Segmentation (UIIS) dataset containing 4,628 images for 7 categories with pixel-level annotations for underwater instance segmentation task
The Universal-Scale object detection Benchmark (USB) is a benchmark for object detection that has variations in object scales and image domains by incorporating COCO with the recently proposed Waymo Open Dataset and Manga109-s dataset. To enable fair comparison, USB establishes different protocols by defining multiple thresholds for training epochs and evaluation image resolutions.
VizWiz-FewShot is a a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.
YouTubeGun Detection Dataset is collected from 343 high-definition YouTube videos and contains 5000 well-chosen images, in which 16064 instances of gun and 9046 instances of person are annotated. Compared to other datasets, YouTube-GDD is "dynamic", containing rich contextual information
To encourage reproducible research, a labeled MultiRAW dataset containing>7k RAW images acquired using multiple camera sensors is made publicly accessible for RAW-domain processing.
This dataset is an extremely challenging set of over 8000+ original Fire and Smoke images captured and crowdsourced from over 1200+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
0 PAPER • NO BENCHMARKS YET
This dataset consists of images of bottles and cups.
This dataset consists of images of Cracked screen like cracked mobile screen.
This dataset is an extremely challenging set of over 5000+ original Electronic Items images captured and crowdsourced from over 1000+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
This dataset is an extremely challenging set of over 5000+ original Hindi text images captured and crowdsourced from over 700+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at DataclusterLabs.
The dataset consists of images of Human palms captured using a mobile phone. The images have been taken in a real-world scenario like holding objects or performing simple gestures. The dataset has a wide variety of variations like illumination, distances etc. It consists of images of 3 main gestures: Frontal-open palm, Back open palm and fist with the wrist. It also has a lot of images with people wearing gloves.
H3D (Humans in 3D) is a dataset of annotated people. The annotations include:
This dataset is an extremely challenging set of over 5000+ original India food images captured and crowdsourced from over 800+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at ****DC Labs.
Infinity AI's Spills Basic Dataset is a synthetic, open-source dataset for safety applications. It features 150 videos of photorealistic liquid spills across 15 common settings. Spills take on in-context reflections, caustics, and depth based on the surrounding environment, lighting, and floor. Each video contains a spill of unique properties (size, color, profile, and more) and is accompanied by pixel-perfect labels and annotations. This dataset can be used to develop computer vision algorithms to detect the location and type of spill from the perspective of a fixed camera.
This dataset is an extremely challenging set of over 7000+ original Masks images captured and crowdsourced from over 1200+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at ****DC Labs.
This dataset is collected by DataCluster Labs, India. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai This dataset is an extremely challenging set of over 3000+ original Mobile Phone images captured and crowdsourced from over 1000+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at ****DC Labs.
The thickness and appearance of retinal layers are essential markers for diagnosing and studying eye diseases. Despite the increasing availability of imaging devices to scan and store large amounts of data, analyzing retinal images and generating trial endpoints has remained a manual, error-prone, and time-consuming task. In particular, the lack of large amounts of high-quality labels for different diseases hinders the development of automated algorithms. Therefore, we have compiled 5016 pixel-wise manual labels for 1672 optical coherence tomography (OCT) scans featuring two different diseases as well as healthy subjects to help democratize the process of developing novel automatic techniques. We also collected 4698 bounding box annotations for a subset of 566 scans across 9 classes of disease biomarker. Due to variations in retinal morphology, intensity range, and changes in contrast and brightness, designing segmentation and detection methods that can generalize to different disease
This dataset is an extremely challenging set of over 2000+ original Oximeter images captured and crowdsourced from over 300+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
SESYD "Systems Evaluation SYnthetic Documents" is a database of synthetical documents with groundtruth. This database targets two main research problems in the document image analysis field (i) symbol recognition and spotting in line drawing images (floorplans and electrical diagrams) (ii) character segmentation and recognition in geographical maps. The database is composed of eleven collections for performance evaluation containing 284k images, 190k symbols and 284k characters (k for thousand). SESYD is today a key database in the document image analysis field published in 2010 and referred by one hundred of citations into research papers.