no code implementations • 2 May 2024 • Christopher Kermorvant, Eva Bardou, Manon Blanco, Bastien Abadie
This paper presents Callico, a web-based open source platform designed to simplify the annotation process in document recognition projects.
no code implementations • 30 Apr 2024 • Solène Tarride, Christopher Kermorvant
In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models.
no code implementations • 29 Apr 2024 • Solène Tarride, Yoann Schneider, Marie Generali-Lince, Mélodie Boillet, Bastien Abadie, Christopher Kermorvant
PyLaia is one of the most popular open-source software for Automatic Text Recognition (ATR), delivering strong performance in terms of speed and accuracy.
no code implementations • 29 Apr 2024 • David Villanova-Aparisi, Solène Tarride, Carlos-D. Martínez-Hinarejos, Verónica Romero, Christopher Kermorvant, Moisés Pastor-Gadea
In this paper, we propose and publicly release a set of reading order independent metrics tailored to Information Extraction evaluation in handwritten documents.
no code implementations • 29 Apr 2024 • Mélodie Boillet, Solène Tarride, Yoann Schneider, Bastien Abadie, Lionel Kesztenbaum, Christopher Kermorvant
For this project, we developed a complete processing workflow: large-scale data collection from French departmental archives, collaborative annotation of documents, training of handwritten table text and structure recognition models, and mass processing of millions of images.
no code implementations • International Workshop on Historical Document Imaging and Processing 2023 • Solène Tarride, Tristan Faine, Mélodie Boillet, Harold Mouchère, Christopher Kermorvant
However, selecting training samples based on the degree of agreement between annotators introduces a bias in the training data and does not improve the results.
Ranked #1 on Handwritten Text Recognition on Belfort
no code implementations • 4 May 2023 • Vittorio Pippi, Silvia Cascianelli, Christopher Kermorvant, Rita Cucchiara
Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets.
no code implementations • 27 Apr 2023 • Solène Tarride, Martin Maarand, Mélodie Boillet, James McGrath, Eugénie Capel, Hélène Vézina, Christopher Kermorvant
Verification of the birth and death acts from this sample shows that 74% of them are considered complete and valid.
no code implementations • 26 Apr 2023 • Solène Tarride, Mélodie Boillet, Christopher Kermorvant
We propose a Transformer-based approach for information extraction from digitized handwritten documents.
no code implementations • 26 Apr 2023 • Solène Tarride, Mélodie Boillet, Jean-François Moufflet, Christopher Kermorvant
We propose a new database for information extraction from historical handwritten documents.
Ranked #1 on Key Information Extraction on SIMARA
no code implementations • 29 Aug 2022 • Mélodie Boillet, Christopher Kermorvant, Thierry Paquet
In the active learning framework, the three first estimators show a significant improvement in performance for the detection of document physical pages and text lines compared to a random selection of images.
no code implementations • 16 Aug 2022 • Silvia Cascianelli, Vittorio Pippi, Martin Maarand, Marcella Cornia, Lorenzo Baraldi, Christopher Kermorvant, Rita Cucchiara
With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years.
no code implementations • 23 Mar 2022 • Mélodie Boillet, Christopher Kermorvant, Thierry Paquet
We present a study conducted using three state-of-the-art systems Doc-UFCN, dhSegment and ARU-Net and show that it is possible to build generic models trained on a wide variety of historical document datasets that can correctly segment diverse unseen pages.
no code implementations • 17 Sep 2021 • Mélodie Boillet, Martin Maarand, Thierry Paquet, Christopher Kermorvant
However, the segmentation of complex documents into semantic regions is sometimes impossible relying only on visual features and recent models embed both visual and textual information.
no code implementations • 28 Dec 2020 • Mélodie Boillet, Christopher Kermorvant, Thierry Paquet
In this paper, we introduce a fully convolutional network for the document layout analysis task.
no code implementations • 1 Dec 2020 • Mélodie Boillet, Marie-Laurence Bonhomme, Dominique Stutzmann, Christopher Kermorvant
We introduce in this paper a new dataset of annotated pages from books of hours, a type of handwritten prayer books owned and used by rich lay people in the late middle ages.
1 code implementation • COLING 2020 • Amir Hazem, Beatrice Daille, Dominique Stutzmann, Christopher Kermorvant, Louis Chevalier
In this paper, we address the segmentation of books of hours, Latin devotional manuscripts of the late Middle Ages, that exhibit challenging issues: a complex hierarchical entangled structure, variable content, noisy transcriptions with no sentence markers, and strong correlations between sections for which topical information is no longer sufficient to draw segmentation boundaries.
no code implementations • 27 Apr 2017 • Bastien Moysset, Christopher Kermorvant, Christian Wolf
Text line detection and localization is a crucial step for full page document analysis, but still suffers from heterogeneity of real life documents.
no code implementations • 5 Dec 2013 • Jérôme Louradour, Christopher Kermorvant
Recurrent Neural Networks (RNN) have recently achieved the best performance in off-line Handwriting Text Recognition.
no code implementations • 5 Nov 2013 • Vu Pham, Théodore Bluche, Christopher Kermorvant, Jérôme Louradour
Recurrent neural networks (RNNs) with Long Short-Term memory cells currently hold the best known results in unconstrained handwriting recognition.