no code implementations • LREC 2022 • Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka, Maite Melero
In this work, we present the work that has been carried on in the MT4All CEF project and the resources that it has generated by leveraging recent research carried out in the field of unsupervised learning.
1 code implementation • LREC 2022 • Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria, Maite Martin
Document-level Neural Machine Translation aims to increase the quality of neural translation models by taking into account contextual information.
no code implementations • EAMT 2022 • Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Recently, diverse approaches have been proposed to get better automatic evaluation results of NMT models using back-translation, including the use of sampling instead of beam search as decoding algorithm for creating the synthetic corpus.
no code implementations • WMT (EMNLP) 2020 • Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Regarding the techniques used, we base on the findings from our previous works for translating clinical texts into Basque, making use of clinical terminology for adapting the MT systems to the clinical domain.
1 code implementation • EMNLP 2021 • Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre
In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data).
1 code implementation • ACL 2022 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.
1 code implementation • 8 Sep 2021 • Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre
In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data).
Ranked #11 on Relation Extraction on TACRED
no code implementations • ACL 2021 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2
no code implementations • ACL 2020 • Ivana Kvapilikova, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.
no code implementations • 31 Dec 2020 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2
no code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.
1 code implementation • EMNLP 2020 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique.
no code implementations • 28 Feb 2020 • Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
In this paper, we analyze the role that such initialization plays in iterative back-translation.
1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods.
no code implementations • ACL 2019 • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre
Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1
1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only.
2 code implementations • CONLL 2018 • Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre
Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.
3 code implementations • EMNLP 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018).
Ranked #3 on Machine Translation on WMT2014 French-English
2 code implementations • ACL 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training.
2 code implementations • ICLR 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho
In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.
Ranked #6 on Machine Translation on WMT2015 English-German
no code implementations • ACL 2017 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs.
no code implementations • WS 2017 • Uxoa I{\~n}urrieta, Itziar Aduriz, Arantza D{\'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola
This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for Spanish-Basque.
no code implementations • COLING 2016 • Uxoa I{\~n}urrieta, Arantza D{\'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola, Itziar Aduriz, John Carroll
We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification.
no code implementations • WS 2016 • Rosa Gaudio, Gorka Labaka, Eneko Agirre, Petya Osenova, Kiril Simov, Martin Popel, Dieke Oele, Gertjan van Noord, Lu{\'\i}s Gomes, Jo{\~a}o Ant{\'o}nio Rodrigues, Steven Neale, Jo{\~a}o Silva, Andreia Querido, Nuno Rendeiro, Ant{\'o}nio Branco
no code implementations • LREC 2016 • Gorka Labaka, I{\~n}aki Alegria, Kepa Sarasola
This paper presents how an state-of-the-art SMT system is enriched by using an extra in-domain parallel corpora extracted from Wikipedia.