Search Results for author: Gorka Labaka

Found 36 papers, 13 papers with code

Unsupervised Machine Translation in Real-World Scenarios

no code implementations • LREC 2022 • Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka, Maite Melero

In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning.

Translation Unsupervised Machine Translation

Paper
Add Code

TANDO: A Corpus for Document-level Machine Translation

1 code implementation • LREC 2022 • Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria, Maite Martin

Document-level Neural Machine Translation aims to increase the quality of neural translation models by taking into account contextual information.

Document Level Machine Translation Machine Translation +2

Paper
Code

Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario

no code implementations • EAMT 2022 • Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Recently, diverse approaches have been proposed to get better automatic evaluation results of NMT models using back-translation, including the use of sampling instead of beam search as decoding algorithm for creating the synthetic corpus.

Machine Translation NMT +2

Paper
Add Code

Label Verbalization and Entailment for Effective Zero and Few-Shot Relation Extraction

1 code implementation • EMNLP 2021 • Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data).

Natural Language Inference Relation +1

148

Paper
Code

Ixamed’s submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation

no code implementations • WMT (EMNLP) 2020 • Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Regarding the techniques used, we base on the findings from our previous works for translating clinical texts into Basque, making use of clinical terminology for adapting the MT systems to the clinical domain.

Domain Adaptation Machine Translation +1

Paper
Add Code

Principled Paraphrase Generation with Parallel Corpora

1 code implementation • ACL 2022 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.

Machine Translation Paraphrase Generation +1

Paper
Code

Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction

1 code implementation • 8 Sep 2021 • Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

Ranked #10 on Relation Extraction on TACRED

Natural Language Inference Relation +1

148

Paper
Code

Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring

no code implementations • ACL 2021 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2

Paper
Add Code

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

no code implementations • ACL 2020 • Ivana Kvapilikova, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar

Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.

Language Modelling Parallel Corpus Mining +4

Paper
Add Code

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring

no code implementations • 31 Dec 2020 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2

Paper
Add Code

A Call for More Rigor in Unsupervised Cross-lingual Learning

no code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.

Cross-Lingual Word Embeddings Position +3

Paper
Add Code

Translation Artifacts in Cross-lingual Transfer Learning

1 code implementation • EMNLP 2020 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique.

Cross-Lingual Transfer Machine Translation +3

Paper
Code

Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation

no code implementations • 28 Feb 2020 • Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre

In this paper, we analyze the role that such initialization plays in iterative back-translation.

NMT Translation +1

Paper
Add Code

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish

no code implementations • WS 2019 • Xabier Soto, Olatz Perez-de-Vi{\~n}aspre, Maite Oronoz, Gorka Labaka

Machine Translation Translation

Paper
Add Code

Bilingual Lexicon Induction through Unsupervised Machine Translation

1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods.

Bilingual Lexicon Induction Language Modelling +6

227

Paper
Code

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

no code implementations • ACL 2019 • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Paper
Add Code

An Effective Approach to Unsupervised Machine Translation

1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only.

Ranked #1 on Unsupervised Machine Translation on WMT2014 English-German

NMT Translation +1

227

Paper
Code

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

2 code implementations • CONLL 2018 • Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.

Word Embeddings

Paper
Code

Unsupervised Statistical Machine Translation

3 code implementations • EMNLP 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018).

Ranked #3 on Machine Translation on WMT2014 French-English

Language Modelling NMT +2

640

Paper
Code

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings

2 code implementations • ACL 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training.

Cross-Lingual Word Embeddings Self-Learning +1

640

Paper
Code

Konbitzul: an MWE-specific database for Spanish-Basque

no code implementations • LREC 2018 • Uxoa I{\~n}urrieta, Itziar Aduriz, Arantza D{\'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola

Machine Translation

Paper
Add Code

Building Named Entity Recognition Taggers via Parallel Corpora

1 code implementation • LREC 2018 • Rodrigo Agerri, Yi-Ling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Machine Translation named-entity-recognition +4

Paper
Code

Unsupervised Neural Machine Translation

2 code implementations • ICLR 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho

In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.

Ranked #6 on Machine Translation on WMT2015 English-German

NMT Translation +1

2,153

Paper
Code

Learning bilingual word embeddings with (almost) no bilingual data

no code implementations • ACL 2017 • Mikel Artetxe, Gorka Labaka, Eneko Agirre

Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs.

Document Classification Entity Linking +5

Paper
Add Code

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque

no code implementations • WS 2017 • Uxoa I{\~n}urrieta, Itziar Aduriz, Arantza D{\'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola

This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for Spanish-Basque.

Machine Translation Translation

Paper
Add Code

Using Linguistic Data for English and Spanish Verb-Noun Combination Identification

no code implementations • COLING 2016 • Uxoa I{\~n}urrieta, Arantza D{\'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola, Itziar Aduriz, John Carroll

We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification.

Chunking Machine Translation