no code implementations • NLP4DH (ICON) 2021 • Thibault Clérice
Literature works may present many autonomous or semi-autonomous units, such as poems for the first or chapter for the second.
1 code implementation • 25 Sep 2023 • Thibault Clérice
In this study, we propose to evaluate the use of deep learning methods for semantic classification at the sentence level to accelerate the process of corpus building in the field of humanities and linguistics, a traditional and time-consuming task.
1 code implementation • 19 Jul 2022 • Thibault Clérice
Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar tasks.
1 code implementation • 23 Sep 2021 • Jean-Baptiste Camps, Thibault Clérice, Frédéric Duval, Lucence Ing, Naomi Kanaoka, Ariane Pinche
Old French is a typical example of an under-resourced historic languages, that furtherly displays animportant amount of linguistic variation.
1 code implementation • 7 Dec 2020 • Jean-Baptiste Camps, Thibault Clérice, Ariane Pinche
Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as well as the variants and errors introduced in the tradition, complicate the task of the would-be stylometrist.
no code implementations • 22 Nov 2020 • Simon Gabay, Thibault Clérice, Jean-Baptiste Camps, Jean-Baptiste Tanguy, Matthias Gille-Levenson
With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e. g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations.
no code implementations • 15 May 2020 • Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero
This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse.