SUT (SUT: a new multi-purpose synthetic dataset for Farsi document image analysis)

This paper introduces a new large-scale dataset for Farsi document images, named SUT, which aims to tackle the challenges associated with obtaining diverse and substantial ground-truth data for supervised models in document image analysis (DIA) tasks, like document image classification, text detection and recognition, and information retrieval. The dataset comprises 62,453 images that have been categorized into 21 distinct classes, including identity documents featuring synthetically generated personal information superimposed on various backgrounds. The dataset also includes corresponding files with labeling information for the images. The ground-truth data is organized in CSV files containing image file paths and associated information about the embedded data.

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Optical Character Recognition (OCR)	SUT	Tesseract
	Document Image Classification	SUT	CNN

Papers

Paper	Code	Results	Date	Stars

SUT (SUT: a new multi-purpose synthetic dataset for Farsi document image analysis)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Gun Detection Dataset

Usage

License

Modalities

Languages

SUT (SUT: a new multi-purpose synthetic dataset for Farsi document image analysis)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Gun Detection Dataset

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages