The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
11,433 PAPERS • 96 BENCHMARKS
Planar object tracking is an actively studied problem in vision-based robotic applications. While several benchmarks have been constructed for evaluating state-of-theart algorithms, there is a lack of video sequences captured in the wild rather than in constrained laboratory environment. In this paper, we present a carefully designed planar object tracking benchmark containing 210 videos of 30 planar objects sampled in the natural environment. In particular, for each object, we shoot seven videos involving various challenging factors, namely scale change, rotation, perspective distortion, motion blur, occlusion, out-of-view, and unconstrained. The ground truth is carefully annotated semi-manually to ensure the quality. Moreover, eleven state-of-the-art algorithms are evaluated on the benchmark using two evaluation metrics, with detailed analysis provided for the evaluation results. We expect the proposed benchmark to benefit future studies on planar object tracking.
7 PAPERS • NO BENCHMARKS YET
Synthetic COCO (S-COCO) is a synthetically created dataset for homography estimation learning. It was introduced by DeTone et al., where the source and target images are generated by duplicating the same COCO image. The source patch $I_S$ is generated by randomly cropping a source candidate at position $p$ with a size of 128 ×128 pixels. Then the patch’s corners are randomly perturbed vertically and horizontally by values within the range [−$\rho$,$\rho$] and the four correspondences define a homography $H_{ST}$ . The inverse of this homography $H_{TS} = (H_{ST} )^{-1}$ is applied to the target candidate and from the resulted warped image a target patch $I_T$ is cropped at the same location p. Both $I_S$ and $I_T$ are the input data with the homography $H_{ST}$ as ground truth.
7 PAPERS • 1 BENCHMARK
YUD+ is a dataset containing additional Vanishing Point Labels for the York Urban Database.
6 PAPERS • NO BENCHMARKS YET
NYU-VP is a new dataset for multi-model fitting, vanishing point (VP) estimation in this case. Each image is annotated with up to eight vanishing points, and pre-extracted line segments are provided which act as data points for a robust estimator. Due to its size, the dataset is the first to allow for supervised learning of a multi-model fitting task.
4 PAPERS • NO BENCHMARKS YET
Photometrically Distorted Synthetic COCO (PDS-COCO) dataset is a synthetically created dataset for homography estimation learning. The idea is exactly the same as in the Synthetic COCO (S-COCO) dataset with SSD-like image distortion added at the beginning of the whole procedure: the first step involves adjusting the brightness of the image using randomly picked value $\delta_b \in \mathcal{U}(-32, 32)$. Next, contrast, saturation and hue noise is applied with the following values: $\delta_c \in \mathcal{U}(0.5, 1.5)$, $\delta_s \in \mathcal{U}(0.5, 1.5)$ and $\delta_h \in \mathcal{U}(-18, 18)$. Finally, the color channels of the image are randomly swapped with a probability of $0.5$. Such a photometric distortion procedure is applied to the original image independently to create source and target candidates.
4 PAPERS • 1 BENCHMARK
In order to collect thermal aerial data, we used FLIR's Boson thermal imager (8.7 mm focal length, 640p resolution, and $50^\circ$ horizontal field of view)\footnote{\url{https://www.flir.es/products/boson/}}. The collected images are nadir at approx. 1m/px spatial resolution. We performed six flights from 9:00 PM to 4:00 AM and label this dataset as \textbf{Boson-nighttime}, accordingly. To create a single map, we first run a structure-from-motion (SfM) algorithm to reconstruct the thermal map from multiple views. Subsequently, orthorectification is performed by aligning the photometric satellite maps with thermal maps at the same spatial resolution. The ground area covered by Boson-nighttime measures $33~\text{km}{^2}$ in total. The most prevalent map feature is the desert, with small portions of farms, roads, and buildings.
2 PAPERS • NO BENCHMARKS YET
Consolidates the world cup 2014 (WC14) and time-series world cup (TSWC) datasets and refines their homography annotations.
A large video dataset with dynamic content.
1 PAPER • NO BENCHMARKS YET