Search Results for author: Gorka Labaka

Found 36 papers, 13 papers with code

Unsupervised Machine Translation in Real-World Scenarios

no code implementations LREC 2022 Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka, Maite Melero

In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning.

Translation Unsupervised Machine Translation

Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario

no code implementations EAMT 2022 Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Recently, diverse approaches have been proposed to get better automatic evaluation results of NMT models using back-translation, including the use of sampling instead of beam search as decoding algorithm for creating the synthetic corpus.

Machine Translation NMT +2

Ixamed’s submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation

no code implementations WMT (EMNLP) 2020 Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Regarding the techniques used, we base on the findings from our previous works for translating clinical texts into Basque, making use of clinical terminology for adapting the MT systems to the clinical domain.

Domain Adaptation es-en +2

Label Verbalization and Entailment for Effective Zero and Few-Shot Relation Extraction

1 code implementation EMNLP 2021 Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data).

Natural Language Inference Relation +1

Principled Paraphrase Generation with Parallel Corpora

1 code implementation ACL 2022 Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.

Diversity Machine Translation +2

Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction

1 code implementation8 Sep 2021 Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data).

Natural Language Inference Relation +1

A Call for More Rigor in Unsupervised Cross-lingual Learning

no code implementations ACL 2020 Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.

Cross-Lingual Word Embeddings Position +3

Translation Artifacts in Cross-lingual Transfer Learning

1 code implementation EMNLP 2020 Mikel Artetxe, Gorka Labaka, Eneko Agirre

Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique.

Cross-Lingual Transfer Machine Translation +3

Bilingual Lexicon Induction through Unsupervised Machine Translation

1 code implementation ACL 2019 Mikel Artetxe, Gorka Labaka, Eneko Agirre

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods.

Bilingual Lexicon Induction Language Modelling +6

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

no code implementations ACL 2019 Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

An Effective Approach to Unsupervised Machine Translation

1 code implementation ACL 2019 Mikel Artetxe, Gorka Labaka, Eneko Agirre

While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only.

NMT Translation +1

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

2 code implementations CONLL 2018 Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.

Word Embeddings

Unsupervised Statistical Machine Translation

3 code implementations EMNLP 2018 Mikel Artetxe, Gorka Labaka, Eneko Agirre

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018).

Language Modelling NMT +2

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings

2 code implementations ACL 2018 Mikel Artetxe, Gorka Labaka, Eneko Agirre

Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training.

Cross-Lingual Word Embeddings Self-Learning +1

Unsupervised Neural Machine Translation

2 code implementations ICLR 2018 Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho

In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.

Decoder NMT +2

Learning bilingual word embeddings with (almost) no bilingual data

no code implementations ACL 2017 Mikel Artetxe, Gorka Labaka, Eneko Agirre

Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs.

Document Classification Entity Linking +5

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque

no code implementations WS 2017 Uxoa I{\~n}urrieta, Itziar Aduriz, Arantza D{\'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola

This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for Spanish-Basque.

Machine Translation Translation

Using Linguistic Data for English and Spanish Verb-Noun Combination Identification

no code implementations COLING 2016 Uxoa I{\~n}urrieta, Arantza D{\'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola, Itziar Aduriz, John Carroll

We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification.

Chunking Machine Translation

Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation

no code implementations LREC 2016 Gorka Labaka, I{\~n}aki Alegria, Kepa Sarasola

This paper presents how an state-of-the-art SMT system is enriched by using an extra in-domain parallel corpora extracted from Wikipedia.

Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.