Search Results for author: Thibault Clérice

Found 7 papers, 4 papers with code

Detecting Sexual Content at the Sentence Level in First Millennium Latin Texts

1 code implementation25 Sep 2023 Thibault Clérice

In this study, we propose to evaluate the use of deep learning methods for semantic classification at the sentence level to accelerate the process of corpus building in the field of humanities and linguistics, a traditional and time-consuming task.

Sentence Sentence Classification

You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine

1 code implementation19 Jul 2022 Thibault Clérice

Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar tasks.

Classification object-detection +4

Corpus and Models for Lemmatisation and POS-tagging of Old French

1 code implementation23 Sep 2021 Jean-Baptiste Camps, Thibault Clérice, Frédéric Duval, Lucence Ing, Naomi Kanaoka, Ariane Pinche

Old French is a typical example of an under-resourced historic languages, that furtherly displays animportant amount of linguistic variation.

POS POS Tagging

Stylometry for Noisy Medieval Data: Evaluating Paul Meyer's Hagiographic Hypothesis

1 code implementation7 Dec 2020 Jean-Baptiste Camps, Thibault Clérice, Ariane Pinche

Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as well as the variants and errors introduced in the tradition, complicate the task of the would-be stylometrist.

Handwritten Text Recognition

Standardizing linguistic data: method and tools for annotating (pre-orthographic) French

no code implementations22 Nov 2020 Simon Gabay, Thibault Clérice, Jean-Baptiste Camps, Jean-Baptiste Tanguy, Matthias Gille-Levenson

With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e. g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations.

POS

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

no code implementations15 May 2020 Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse.

POS POS Tagging

Cannot find the paper you are looking for? You can Submit a new open access paper.