Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis

LREC 2016 · Amir Hazem, B{\'e}atrice Daille ·

Bilingual lexicon extraction from comparable corpora is usually based on distributional methods when dealing with single word terms (SWT). These methods often treat SWT as single tokens without considering their compositional property. However, many SWT are compositional (composed of roots and affixes) and this information, if taken into account can be very useful to match translational pairs, especially for infrequent terms where distributional methods often fail. For instance, the English compound \textit{xenograft} which is composed of the root \textit{xeno} and the lexeme \textit{graft} can be translated into French compositionally by aligning each of its elements (\textit{xeno} with \textit{x{\'e}no} and \textit{graft} with \textit{greffe}) resulting in the translation: \textit{x{\'e}nogreffe}. In this paper, we experiment several distributional modellings at the morpheme level that we apply to perform compositional translation to a subset of French and English compounds. We show promising results using distributional analysis at the root and affix levels. We also show that the adapted approach significantly improve bilingual lexicon extraction from comparable corpora compared to the approach at the word level.

PDF Abstract LREC 2016 PDF LREC 2016 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Translation

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove