no code implementations • 19 Jan 2022 • Christian Reul, Stefan Tomasek, Florian Langhanki, Uwe Springmann
We report on our efforts to construct mixed recognition models which can be applied out-of-the-box without any further document-specific training but also serve as a starting point for finetuning by training a new model on a few pages of transcribed text (ground truth).
no code implementations • 15 Jun 2021 • Christian Reul, Christoph Wick, Maximilian Nöth, Andreas Büttner, Maximilian Wehner, Uwe Springmann
Training a more specialized model for some unseen Early Modern Latin books starting from our mixed model led to a CER of 1. 47%, an improvement of up to 50% compared to training from scratch and up to 30% compared to training from the aforementioned standard model.
no code implementations • 9 Sep 2019 • Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe
Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processing.
Optical Character Recognition Optical Character Recognition (OCR)
1 code implementation • 8 Oct 2018 • Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i. e. models trained to recognize a variety of fonts and typesets from previously unseen sources.
Optical Character Recognition Optical Character Recognition (OCR)
no code implementations • 14 Sep 2018 • Uwe Springmann, Christian Reul, Stefanie Dipper, Johannes Baiter
In this paper we describe a dataset of German and Latin \textit{ground truth} (GT) for historical OCR in the form of printed text line images paired with their transcription.
1 code implementation • 27 Feb 2018 • Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch.
1 code implementation • 15 Dec 2017 • Christian Reul, Christoph Wick, Uwe Springmann, Frank Puppe
The evaluation on seven early printed books showed that training from the Latin mixed model reduces the average amount of errors by 43% and 26%, respectively compared to training from scratch with 60 and 150 lines of ground truth, respectively.
1 code implementation • 27 Nov 2017 • Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
Experiments on seven early printed books show that the proposed method outperforms the standard approach considerably by reducing the amount of errors by up to 50% and more.
2 code implementations • 20 Jan 2017 • Christian Reul, Uwe Springmann, Frank Puppe
A semi-automatic open-source tool for layout analysis on early printed books is presented.
no code implementations • 19 Jan 2017 • Florian Fink, Klaus-U. Schulz, Uwe Springmann
Here we improve this method in three respects: First, the method in Reffle (2013) is not adaptive: user feedback obtained by actual postcorrection steps cannot be used to compute refined profiles.