The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.
4 PAPERS • 1 BENCHMARK
A dataset for flying honeybee detection introduced in "A Method for Detection of Small Moving Objects in UAV Videos".
1 PAPER • 1 BENCHMARK
USC-GRAD-STDdb comprises 115 video segments containing more than 25,000 annotated frames of HD 720p resolution (≈1280x720) with small objects of interest from 16 (≈4x4) to 256 (≈16x16) as pixel area. The length of the videos changes from 150 up to 500 frames. The size of every object is determined through the bounding box, so that a good annotation is of utmost importance for reliable performance metrics. As it may seem obvious, the smaller the object, the harder the annotation. The annotation has been carried out with the ViTBAT tool, adjusting the boxes as much as possible to the objects of interest in each video frame. In total, more than 56,000 ground truth labels have been generated.