The IAM database contains 13,353 images of handwritten lines of text created by 657 writers. The texts those writers transcribed are from the Lancaster-Oslo/Bergen Corpus of British English. It includes contributions from 657 writers making a total of 1,539 handwritten pages comprising of 115,320 words and is categorized as part of modern collection. The database is labeled at the sentence, line, and word levels.
167 PAPERS • 2 BENCHMARKS
The RIMES database (Reconnaissance et Indexation de données Manuscrites et de fac similÉS / Recognition and Indexing of handwritten documents and faxes) was created to evaluate automatic systems of recognition and indexing of handwritten letters. Of particular interest are cases such as those sent by postal mail or fax by individuals to companies or administrations.
7 PAPERS • NO BENCHMARKS YET
Bentham manuscripts refers to a large set of documents that were written by the renowned English philosopher and reformer Jeremy Bentham (1748-1832). Volunteers of the Transcribe Bentham initiative transcribed this collection. Currently, >6 000 documents or > 25 000 pages have been transcribed using this public web platform. For our experiments, we used the BenthamR0 dataset a part of the Bentham manuscripts.
4 PAPERS • 1 BENCHMARK
The database is written in Cyrillic and shares the same 33 characters. Besides these characters, the Kazakh alphabet also contains 9 additional specific characters. This dataset is a collection of forms. The sources of all the forms in the datasets were generated by LATEX which subsequently was filled out by persons with their handwriting. The database consists of more than 1400 filled forms. There are approximately 63000 sentences, more than 715699 symbols produced by approximately 200 diferent writers. We utilized three different datasets described as following:
This dataset arises from the READ project (Horizon 2020).
Konzil dataset was created by specialists of the University of Greifswald. It contains manuscripts written in modern German. Train sample consists of 353 lines, validation - 29 lines and test - 87 lines.
3 PAPERS • NO BENCHMARKS YET
Patzig contains handwritten texts written in modern German. Train sample consists of 485 lines, validation - 38 lines and test -118 lines.
Ricordi contains handwritten texts written in Italian. Train sample consists of 295 lines, validation - 19 lines and test - 69 lines.
Schiller contains handwritten texts written in modern German. Train sample consists of 244 lines, validation - 21 lines and test - 63 lines.
Schwerin contains handwritten texts written in medieval German. Train sample consists of 793 lines, validation - 68 lines and test - 196 lines.
Saint Gall dataset contains handwritten historical manuscripts written in Latin that date back to the 9th century. It consists of 60 pages, 1 410 text lines and 11 597 words.
2 PAPERS • 1 BENCHMARK
MatriVasha the largest dataset of handwritten Bangla compound characters for research on handwritten Bangla compound character recognition. The proposed dataset contains 120 different types of compound characters that consist of 306,464 images written where 152,950 male and 153,514 female handwritten Bangla compound characters. This dataset can be used for other issues such as gender, age, district base handwriting research because the sample was collected that included district authenticity, age group, and an equal number of men and women.
1 PAPER • NO BENCHMARKS YET
A dataset with $23\,870$ digital trajectories (i.e. time series) of handwritten lower- and uppercase Latin letters and Arabic numbers ($a$-$z$, $A$-$Z$, $0$-$9$), generated by $77$ experts using a Wacom Pen Tablet. An expert is considered a proficient user of the recorded symbols, in this case adult native German speakers.
0 PAPER • NO BENCHMARKS YET