Search Results for author: Ga{\"e}l Lejeune

Found 20 papers, 0 papers with code

Multilingual Epidemiological Text Classification: A Comparative Study

no code implementations COLING 2020 Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Adam Jatowt, Ga{\"e}l Lejeune, Moses Odeo

We conduct a comparative study of different machine and deep learning text classification models using a dataset comprising news articles related to epidemic outbreaks from six languages, four low-resourced and two high-resourced, in order to analyze the influence of the nature of the language, the structure of the document, and the size of the data.

Multilingual text classification text-classification +1

Calcul de similarit\'e entre phrases : quelles mesures et quels descripteurs ? (Sentence Similarity : a study on similarity metrics with words and character strings )

no code implementations JEPTALNRECITAL 2020 Davide Buscaldi, Ghazi Felhi, Dhaou Ghoul, Joseph Le Roux, Ga{\"e}l Lejeune, Xu-Dong Zhang

Dans notre travail nous nous sommes int{\'e}ress{\'e} {\`a} deux questions : celle du choix de la mesure du similarit{\'e} d{'}une part et celle du choix des op{\'e}randes sur lesquelles se porte la mesure de similarit{\'e}.

Sentence Sentence Similarity

Que rec\`elent les donn\'ees textuelles issues du web ? (What do text data from the Web have to hide ?)

no code implementations JEPTALNRECITAL 2020 Adrien Barbaresi, Ga{\"e}l Lejeune

La collecte et l{'}usage opportunistes de donn{\'e}es textuelles tir{\'e}es du web sont sujets {\`a} une s{\'e}rie de probl{\`e}mes {\'e}thiques, m{\'e}thodologiques et {\'e}pist{\'e}mologiques qui m{\'e}ritent l{'}attention de la communaut{\'e} scientifique.

Out-of-the-Box and into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools

no code implementations LREC 2020 Adrien Barbaresi, Ga{\"e}l Lejeune

This article examines extraction methods designed to retain the main text content of web pages and discusses how the extraction could be oriented and evaluated: can and should it be as generic as possible to ensure opportunistic corpus construction?

A Dataset for Multi-lingual Epidemiological Event Extraction

no code implementations LREC 2020 Stephen Mutuvi, Antoine Doucet, Ga{\"e}l Lejeune, Moses Odeo

This paper proposes a corpus for the development and evaluation of tools and techniques for identifying emerging infectious disease threats in online news text.

Event Extraction text-classification +1

Dating Ancient texts: an Approach for Noisy French Documents

no code implementations LREC 2020 Ana{\"e}lle Baledent, Nicolas Hiebel, Ga{\"e}l Lejeune

The experiments presented in this article focused on documents written in French but we believe that the ability of character-level models to handle noise properly would help to achieve comparable results on other languages and more ancient languages in particular.

Document Dating POS

MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

no code implementations WS 2019 Dhaou Ghoul, Ga{\"e}l Lejeune

We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID).

Dialect Identification General Classification

Indexation et appariements de documents cliniques pour le Deft 2019 (Indexing and pairing texts of the medical domain )

no code implementations JEPTALNRECITAL 2019 Davide Buscaldi, Dhaou Ghoul, Joseph Le Roux, Ga{\"e}l Lejeune

Pour la ta{\^c}he d{'}indexation nous avons test{\'e} deux m{\'e}thodes, une fond{\'e}e sur l{'}appariemetn pr{\'e}alable des documents du jeu de tset avec les documents du jeu d{'}entra{\^\i}nement et une autre m{\'e}thode fond{\'e}e sur l{'}annotation terminologique.

Mod\`eles en Caract\`eres pour la D\'etection de Polarit\'e dans les Tweets (Character-level Models for Polarity Detection in Tweets )

no code implementations JEPTALNRECITAL 2018 Davide Buscaldi, Joseph Le Roux, Ga{\"e}l Lejeune

Notre premi{\`e}re m{\'e}thode est fond{\'e}e sur des lexiques (mots et emojis), les n-grammes de caract{\`e}res et un classificateur {\`a} vaste marge (ou SVM).

Character Based Pattern Mining for Neology Detection

no code implementations WS 2017 Ga{\"e}l Lejeune, Emmanuel Cartier

In this paper, neology detection is considered as a classification task where a system has to assess whether a given lexical item is an actual neologism or not.

General Classification

Ambiguity Diagnosis for Terms in Digital Humanities

no code implementations LREC 2016 B{\'e}atrice Daille, Evelyne Jacquey, Ga{\"e}l Lejeune, Luis Felipe Melo, Yannick Toussaint

If a lexical unit is indeed a term of the domain, it is not true, even in a specialised corpus, that all its occurrences are terminological.

Word Sense Disambiguation

Vers un diagnostic d'ambigu\"\it\'e des termes candidats d'un texte

no code implementations JEPTALNRECITAL 2015 Ga{\"e}l Lejeune, B{\'e}atrice Daille

Dans cet article, nous nous int{\'e}ressons {\`a} l{'}ambigu{\"\i}t{\'e} d{'}un terme en domaine de sp{\'e}cialit{\'e}.

\'Evaluation intrins\`eque et extrins\`eque du nettoyage de pages Web

no code implementations JEPTALNRECITAL 2015 Ga{\"e}l Lejeune, Romain Brixtel, Charlotte Lecluze

Nous proposons deux types d{'}{\'e}valuation de cette t{\^a}che de d{\'e}tourage : (I) une {\'e}valuation intrins{\`e}que fond{\'e}e sur le contenu en mots, balises et caract{\`e}res ; (II) une {\'e}valuation extrins{\`e}que fond{\'e}e sur la t{\^a}che, en examinant l{'}effet du d{\'e}tourage des documents sur le syst{\`e}me plac{\'e} en aval de la cha{\^\i}ne de traitement.

Cannot find the paper you are looking for? You can Submit a new open access paper.