3 dataset results for Handwritten Text Recognition AND Russian

The IAM database contains 13,353 images of handwritten lines of text created by 657 writers. The texts those writers transcribed are from the Lancaster-Oslo/Bergen Corpus of British English. It includes contributions from 657 writers making a total of 1,539 handwritten pages comprising of 115,320 words and is categorized as part of modern collection. The database is labeled at the sentence, line, and word levels.

167 PAPERS • 2 BENCHMARKS

HKR (Handwritten Kazakh and Russian (HKR) Database for Text Recognition)

The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains 9 additional specific characters. This dataset is a collection of forms. The sources of all the forms in the datasets were generated by LATEX which subsequently was filled out by persons with their handwriting. The database consists of more than 1400 filled forms. There are approximately 63000 sentences, more than 715699 symbols produced by approximately 200 diferent writers. We utilized three different datasets described as following:

4 PAPERS • 1 BENCHMARK

Digital Peter

Digital Peter is a dataset of Peter the Great's manuscripts annotated for segmentation and text recognition. The dataset may be useful for researchers to train handwriting text recognition models as a benchmark for comparing different models. It consists of 9,694 images and text files corresponding to lines in historical documents. The dataset includes Peter’s handwritten materials covering the period from 1709 to 1713.

3 PAPERS • 1 BENCHMARK

Datasets

3 dataset results for Handwritten Text Recognition AND Russian