no code implementations • LREC 2022 • Amir Hazem, Merieme Bouhandi, Florian Boudin, Beatrice Daille
Automatic Term Extraction (ATE) is a key component for domain knowledge understanding and an important basis for further natural language processing applications.
no code implementations • COLING 2020 • Martin Laville, Amir Hazem, Emmanuel Morin, Phillippe Langlais
In this paper, we contrast several data selection techniques to improve bilingual lexicon induction from specialized comparable corpora.
1 code implementation • COLING 2020 • Amir Hazem, Beatrice Daille, Dominique Stutzmann, Christopher Kermorvant, Louis Chevalier
In this paper, we address the segmentation of books of hours, Latin devotional manuscripts of the late Middle Ages, that exhibit challenging issues: a complex hierarchical entangled structure, variable content, noisy transcriptions with no sentence markers, and strong correlations between sections for which topical information is no longer sufficient to draw segmentation boundaries.
no code implementations • LREC 2020 • Amir Hazem, Beatrice Daille, Lanza Claudia
Thesaurus construction with minimum human efforts often relies on automatic methods to discover terms and their relations.
no code implementations • LREC 2020 • Amir Hazem, Bouh, M{\'e}rieme i, Florian Boudin, Beatrice Daille
Automatic terminology extraction is a notoriously difficult task aiming to ease effort demanded to manually identify terms in domain-specific corpora by automatically providing a ranked list of candidate terms.
no code implementations • LREC 2020 • Martin Laville, Amir Hazem, Emmanuel Morin
This paper describes the TALN/LS2N system participation at the Building and Using Comparable Corpora (BUCC) shared task.
no code implementations • RANLP 2019 • Amir Hazem, Hern, Nicolas ez
In this paper, we propose a systematic study of the impact of the main word embedding models on sentence representation.
no code implementations • RANLP 2019 • Amir Hazem, Hern, Nicolas ez
In this paper, we introduce the concept of disruption which we define as a side effect of the training process of embedding models.
no code implementations • WS 2019 • Amir Hazem, B{\'e}atrice Daille, Dominique Stutzmann, Jacob Currie, Christine Jacquin
Based on the manual observation of 772 Obsecro Te copies which show more than 21, 000 variants, we show that the proposed methodology is helpful for an automatic study of variants and may serve as basis to analyze and to depict useful information from devotional texts.
no code implementations • JEPTALNRECITAL 2019 • Amir Hazem, B{\'e}atrice Daille, Dominique Stutzmann, Jacob Currie, Christine Jacquin
Nous nous int{\'e}ressons dans cet article {\`a} la probl{\'e}matique de r{\'e}utilisation de textes dans les livres liturgiques du Moyen {\^A}ge.
no code implementations • COLING 2018 • Amir Hazem, Emmanuel Morin
For that purpose, we propose the first systematic evaluation of different word embedding models for bilingual terminology extraction from specialized comparable corpora.
no code implementations • IJCNLP 2017 • Amir Hazem
In this paper we present MappSent, a textual similarity approach that we applied to the multi-choice question answering in exams shared task.
no code implementations • IJCNLP 2017 • Amir Hazem, Emmanuel Morin
Bilingual lexicon extraction from comparable corpora is constrained by the small amount of available data when dealing with specialized domains.
no code implementations • RANLP 2017 • Amir Hazem, Basma El Amel Boussaha, Hern, Nicolas ez
Since the advent of word embedding methods, the representation of longer pieces of texts such as sentences and paragraphs is gaining more and more interest, especially for textual similarity tasks.
no code implementations • COLING 2016 • Amir Hazem, Emmanuel Morin
Comparable corpora are the main alternative to the use of parallel corpora to extract bilingual lexicons.
no code implementations • LREC 2016 • Amir Hazem, Emmanuel Morin
There is a rich flora of word space models that have proven their efficiency in many different applications including information retrieval (Dumais, 1988), word sense disambiguation (Schutze, 1992), various semantic knowledge tests (Lund et al., 1995; Karlgren, 2001), and text categorization (Sahlgren, 2005).
no code implementations • LREC 2016 • Amir Hazem, B{\'e}atrice Daille
We also show that the adapted approach significantly improve bilingual lexicon extraction from comparable corpora compared to the approach at the word level.
no code implementations • LREC 2014 • B{\'e}atrice Daille, Amir Hazem
Automatic synonyms and semantically related word extraction is a challenging task, useful in many NLP applications such as question answering, search query expansion, text summarization, etc.
no code implementations • LREC 2012 • Amir Hazem, Emmanuel Morin
One of the main resources used for the task of bilingual lexicon extraction from comparable corpora is : the bilingual dictionary, which is considered as a bridge between two languages.