PENELOPIE: Enabling Open Information Extraction for the Greek Language through Machine Translation
In this paper we present our submission for the EACL 2021 SRW; a methodology that aims at bridging the gap between high and low-resource languages in the context of Open Information Extraction, showcasing it on the Greek language. The goals of this paper are twofold: First, we build Neural Machine Translation (NMT) models for English-to-Greek and Greek-to-English based on the Transformer architecture. Second, we leverage these NMT models to produce English translations of Greek text as input for our NLP pipeline, to which we apply a series of pre-processing and triple extraction tasks. Finally, we back-translate the extracted triples to Greek. We conduct an evaluation of both our NMT and OIE methods on benchmark datasets and demonstrate that our approach outperforms the current state-of-the-art for the Greek natural language.
PDF Abstract EACL 2021 PDF EACL 2021 AbstractResults from the Paper
Ranked #1 on
Machine Translation
on Tatoeba (EN-to-EL)
(using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Open Information Extraction | CaRB OIE benchmark (Greek Use-case) | PENELOPIE Greek OIE | F1 | 0.255 | # 1 | ||
Machine Translation | Tatoeba (EL-to-EN) | PENELOPIE (Transformers-based Greek-to-English NMT) | BLEU | 79.3 | # 1 | ||
Machine Translation | Tatoeba (EN-to-EL) | PENELOPIE Transformers-based NMT (EN2EL) | BLEU | 76.9 | # 1 |