The semantic line (SEL) dataset contains 1,750 outdoor images in total, which are split into 1,575 training and 175 testing images. Each semantic line is annotated by the coordinates of the two end-points on an image boundary. If an image has a single dominant line, it is set as the ground truth primary semantic line. If an image has multiple semantic lines, the line with the best rank by human annotators is set as the ground-truth primary line, and the others as additional ground-truth semantic lines. In SEL, 61% of the images contain multiple semantic lines.
5 PAPERS • 1 BENCHMARK
NKL (short for NanKai Lines) is a dataset for semantic line detection. Semantic lines are meaningful line structures that outline the conceptual structure of natural images. The NKL dataset contains 5,000 images of various scenes. Each of these images is annotated by multiple skilled human annotators. The dataset is split into training and validation subsets. There are 4,000 images in the training set and 1,000 in the validation set.
3 PAPERS • 1 BENCHMARK
The UrduDoc Dataset is a benchmark dataset for Urdu text line detection in scanned documents. It is created as a byproduct of the UTRSet-Real dataset generation process. Comprising 478 diverse images collected from various sources such as books, documents, manuscripts, and newspapers, it offers a valuable resource for research in Urdu document analysis. It includes 358 pages for training and 120 pages for validation, featuring a wide range of styles, scales, and lighting conditions. It serves as a benchmark for evaluating printed Urdu text detection models, and the benchmark results of state-of-the-art models are provided. The Contour-Net model demonstrates the best performance in terms of h-mean.
1 PAPER • 1 BENCHMARK