Search Results for author: Simon Gabay

Found 8 papers, 1 papers with code

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

no code implementations15 May 2020 Jean-Baptiste Camps, Simon Gabay, Paul Fièvre, Thibault Clérice, Florian Cafiero

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse.

POS POS Tagging

Standardizing linguistic data: method and tools for annotating (pre-orthographic) French

no code implementations22 Nov 2020 Simon Gabay, Thibault Clérice, Jean-Baptiste Camps, Jean-Baptiste Tanguy, Matthias Gille-Levenson

With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e. g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations.

POS

From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French

no code implementations18 Feb 2022 Simon Gabay, Pedro Ortiz Suarez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot

Because these historical states are at the same time more complex to process and more scarce in the corpora available, specific efforts are necessary to train natural language processing (NLP) tools adapted to the data.

Language Modelling Part-Of-Speech Tagging +1

Automatic Normalisation of Early Modern French

1 code implementation LREC 2022 Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot, Simon Gabay

Spelling normalisation is a useful step in the study and analysis of historical language texts, whether it is manual analysis by experts or automatic analysis using downstream natural language processing (NLP) tools.

From FreEM to D’AlemBERT: a Large Corpus and a Language Model for Early Modern French

no code implementations LREC 2022 Simon Gabay, Pedro Ortiz Suarez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot

anguage models for historical states of language are becoming increasingly important to allow the optimal digitisation and analysis of old textual sources.

Language Modelling

Le projet FREEM : ressources, outils et enjeux pour l’étude du français d’Ancien Régime (The F RE EM project: Resources, tools and challenges for the study of Ancien Régime French)

no code implementations JEP/TALN/RECITAL 2022 Simon Gabay, Pedro Ortiz Suarez, Rachel Bawden, Alexandre Bartz, Philippe Gambette, Benoît Sagot

En dépit de leur qualité certaine, les ressources et outils disponibles pour l’analyse du français d’Ancien Régime ne sont plus à même de répondre aux enjeux de la recherche en linguistique et en littérature pour cette période.

A Data-driven Approach to Named Entity Recognition for Early Modern French

no code implementations COLING 2022 Pedro Ortiz Suarez, Simon Gabay

However, instead of developing a specialised architecture to tackle the particularities of this state of language, we opt for a data-driven approach by developing a new corpus with fine-grained entity annotation, covering three centuries of literature corresponding to the early modern period; we try to annotate as much data as possible producing a corpus that is many times bigger than the most popular NER evaluation corpora for both Contemporary English and French.

named-entity-recognition Named Entity Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.