no code implementations • EMNLP (NLP-COVID19) 2020 • Arantxa Otegi, Jon Ander Campos, Gorka Azkune, Aitor Soroa, Eneko Agirre
In this paper we present a quantitative and qualitative analysis of the system.
1 code implementation • EMNLP 2021 • Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre
In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data).
1 code implementation • 25 Feb 2025 • Ahmed Elhady, Eneko Agirre, Mikel Artetxe
The performance of the models drops 12. 1 points on average with respect to the original versions of the datasets.
no code implementations • 31 Jul 2024 • Oscar Sainz, Iker García-Ferrero, Alon Jacovi, Jon Ander Campos, Yanai Elazar, Eneko Agirre, Yoav Goldberg, Wei-Lin Chen, Jenny Chim, Leshem Choshen, Luca D'Amico-Wong, Melissa Dell, Run-Ze Fan, Shahriar Golchin, Yucheng Li, PengFei Liu, Bhavish Pahwa, Ameya Prabhu, Suryansh Sharma, Emily Silcock, Kateryna Solonko, David Stap, Mihai Surdeanu, Yu-Min Tseng, Vishaal Udandarao, Zengzhi Wang, Ruijie Xu, Jinglin Yang
The workshop fostered a shared task to collect evidence on data contamination in current available datasets and models.
1 code implementation • 14 Jun 2024 • Imanol Miranda, Ander Salaberria, Eneko Agirre, Gorka Azkune
The novelty of BiVLC is to add a synthetic hard negative image generated from the synthetic text, resulting in two image-to-text retrieval examples (one for each image) and, more importantly, two text-to-image retrieval examples (one for each text).
no code implementations • 9 Apr 2024 • Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre
To perform the experiments we introduce EusIE, an event extraction dataset for Basque, which follows the Multilingual Event Extraction dataset (MEE).
1 code implementation • 29 Mar 2024 • Julen Etxaniz, Oscar Sainz, Naiara Perez, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters.
1 code implementation • 20 Mar 2024 • Gorka Azkune, Ander Salaberria, Eneko Agirre
This paper shows that text-only Language Models (LM) can learn to ground spatial relations like "left of" or "below" if they are provided with explicit location information of objects and they are properly trained to leverage those locations.
1 code implementation • 1 Mar 2024 • Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre, Frank Keller
We hypothesize that this is because explicit spatial relations rarely appear in the image captions used to train these models.
1 code implementation • 16 Nov 2023 • Iñigo Alonso, Eneko Agirre, Mirella Lapata
Table-to-text generation involves generating appropriate textual descriptions given structured tabular data.
2 code implementations • 27 Oct 2023 • Oscar Sainz, Jon Ander Campos, Iker García-Ferrero, Julen Etxaniz, Oier Lopez de Lacalle, Eneko Agirre
In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble.
1 code implementation • 26 Oct 2023 • Iñigo Alonso, Eneko Agirre
Table-to-text systems generate natural language statements from structured data like tables.
no code implementations • 13 Oct 2023 • Carlos Dominguez, Jon Ander Campos, Eneko Agirre, Gorka Azkune
We focus on the BEIR benchmark, which includes test datasets from several domains with no training data, and explore two scenarios: zero-shot, where the supervised system is trained in a large out-of-domain dataset (MS-MARCO); and unsupervised domain adaptation, where, in addition to MS-MARCO, the system is fine-tuned in synthetic data from the target domain.
1 code implementation • 5 Oct 2023 • Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre
In this paper, we propose GoLLIE (Guideline-following Large Language Model for IE), a model able to improve zero-shot results on unseen IE tasks by virtue of being fine-tuned to comply with annotation guidelines.
Ranked #1 on
Zero-shot Named Entity Recognition (NER)
on HarveyNER
(using extra training data)
no code implementations • 23 May 2023 • Aitor Ormazabal, Mikel Artetxe, Eneko Agirre
Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model, and work by modifying its parameters.
no code implementations • 7 Feb 2023 • Oscar Sainz, Oier Lopez de Lacalle, Eneko Agirre, German Rigau
Language Models are the core for almost any Natural Language Processing system nowadays.
1 code implementation • 16 Dec 2022 • Rodrigo Agerri, Eneko Agirre
Given the impact of language models on the field of Natural Language Processing, a number of Spanish encoder-only masked language models (aka BERTs) have been trained and released.
1 code implementation • ACL 2022 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.
1 code implementation • 24 May 2022 • Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre
During inference, we build control codes for the desired meter and rhyme scheme, and condition our language model on them to generate formal verse poetry.
1 code implementation • Findings (NAACL) 2022 • Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre
In this work we show that entailment is also effective in Event Argument Extraction (EAE), reducing the need of manual annotation to 50% and 20% in ACE and WikiEvents respectively, while achieving the same performance as with full training.
Ranked #1 on
Event Argument Extraction
on WikiEvents
2 code implementations • NAACL (ACL) 2022 • Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min
The current workflow for Information Extraction (IE) analysts involves the definition of the entities/relations of interest and a training corpus with annotated examples.
no code implementations • 1 Nov 2021 • Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heinz, Dan Roth
Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field.
1 code implementation • 15 Sep 2021 • Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
Our results on a visual question answering task which requires external knowledge (OK-VQA) show that our text-only model outperforms pretrained multimodal (image-text) models of comparable number of parameters.
1 code implementation • 8 Sep 2021 • Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre
In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data).
Ranked #11 on
Relation Extraction
on TACRED
no code implementations • ACL 2021 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction
Cross-Lingual Word Embeddings
+2
no code implementations • ACL 2020 • Ivana Kvapilikova, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.
1 code implementation • 1 Feb 2021 • Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre
Previous work did not use the caption text information, but a manually provided relation holding between the subject and the object.
no code implementations • 31 Dec 2020 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction
Cross-Lingual Word Embeddings
+2
1 code implementation • COLING 2020 • Jon Ander Campos, Kyunghyun Cho, Arantxa Otegi, Aitor Soroa, Gorka Azkune, Eneko Agirre
The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility.
1 code implementation • EMNLP 2020 • Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak
In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots.
no code implementations • ACL 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.
no code implementations • 4 May 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.
no code implementations • LREC 2020 • Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre
Conversational Question Answering (CQA) systems meet user information needs by having conversations with them, where answers to the questions are retrieved from text.
no code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.
no code implementations • ACL 2020 • Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak
For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT).
1 code implementation • EMNLP 2020 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique.
1 code implementation • 4 Apr 2020 • Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune, Eneko Agirre
In the case of textual representations, inference tasks such as Textual Entailment and Semantic Textual Similarity have been often used to benchmark the quality of textual representations.
1 code implementation • LREC 2020 • Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre
This is suboptimal as, for many languages, the models have been trained on smaller (or lower quality) corpora.
no code implementations • 28 Feb 2020 • Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
In this paper, we analyze the role that such initialization plays in iterative back-translation.
1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods.
no code implementations • ACL 2019 • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre
Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.
Bilingual Lexicon Induction
Cross-Lingual Word Embeddings
+1
1 code implementation • ACL 2019 • Yadollah Yaghoobzadeh, Katharina Kann, Timothy J. Hazen, Eneko Agirre, Hinrich Schütze
Word embeddings typically represent different meanings of a word in a single conflated vector.
no code implementations • 10 May 2019 • Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, Mark Cieliebak
We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.
1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only.
1 code implementation • CONLL 2018 • Ander Barrena, Aitor Soroa, Eneko Agirre
Named Entity Disambiguation algorithms typically learn a single model for all target entities.
no code implementations • 11 Sep 2018 • Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
In this paper we introduce vSTS, a new dataset for measuring textual similarity of sentences using multimodal information.
2 code implementations • CONLL 2018 • Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre
Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.
3 code implementations • EMNLP 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018).
Ranked #3 on
Machine Translation
on WMT2014 French-English
2 code implementations • ACL 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training.
no code implementations • WS 2018 • Eneko Agirre, Oier López de Lacalle, Aitor Soroa
UKB is an open source collection of programs for performing, among other tasks, knowledge-based Word Sense Disambiguation (WSD).
2 code implementations • ICLR 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho
In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.
Ranked #6 on
Machine Translation
on WMT2015 English-German
no code implementations • SEMEVAL 2017 • Daniel Cer, Mona Diab, Eneko Agirre, I{\~n}igo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
3 code implementations • 31 Jul 2017 • Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
no code implementations • ACL 2017 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs.
no code implementations • COLING 2016 • Haiqing Tang, Deyi Xiong, Oier Lopez de Lacalle, Eneko Agirre
Selecting appropriate translations for source words with multiple meanings still remains a challenge for statistical machine translation (SMT).
no code implementations • WS 2016 • Rosa Gaudio, Gorka Labaka, Eneko Agirre, Petya Osenova, Kiril Simov, Martin Popel, Dieke Oele, Gertjan van Noord, Lu{\'\i}s Gomes, Jo{\~a}o Ant{\'o}nio Rodrigues, Steven Neale, Jo{\~a}o Silva, Andreia Querido, Nuno Rendeiro, Ant{\'o}nio Branco
no code implementations • LREC 2016 • Arantxa Otegi, Nora Aranberri, Antonio Branco, Jan Haji{\v{c}}, Martin Popel, Kiril Simov, Eneko Agirre, Petya Osenova, Rita Pereira, Jo{\~a}o Silva, Steven Neale
This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference.
no code implementations • LREC 2016 • Marten Postma, Ruben Izquierdo, Eneko Agirre, German Rigau, Piek Vossen
Word Sense Disambiguation (WSD) systems tend to have a strong bias towards assigning the Most Frequent Sense (MFS), which results in high performance on the MFS but in a very low performance on the less frequent senses.
no code implementations • LREC 2016 • Angel Chang, Valentin I. Spitkovsky, Christopher D. Manning, Eneko Agirre
Named Entity Disambiguation (NED) is the task of linking a named-entity mention to an instance in a knowledge-base, typically Wikipedia-derived resources like DBpedia.
no code implementations • LREC 2016 • Xabier Saralegi, Eneko Agirre, I{\~n}aki Alegria
Translation quality improved in all three types (generalization, specification, and drifting), and CLIR improved for generalization and specification sessions, preserving the performance in drifting sessions.
no code implementations • LREC 2016 • Steven Neale, Lu{\'\i}s Gomes, Eneko Agirre, Oier Lopez de Lacalle, Ant{\'o}nio Branco
Although it is commonly assumed that word sense disambiguation (WSD) should help to improve lexical choice and improve the quality of machine translation systems, how to successfully integrate word senses into such systems remains an unanswered question.
no code implementations • 15 Mar 2016 • Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning, Eneko Agirre
Named Entity Disambiguation (NED) is the task of linking a named-entity mention to an instance in a knowledge-base, typically Wikipedia.
no code implementations • IJCNLP 2015 • Roland Roller, Eneko Agirre, Aitor Soroa, Mark Stevenson
Distant supervision is a widely applied approach to automatic training of relation extraction systems and has the advantage that it can generate large amounts of labelled data with minimal effort.
no code implementations • SEMEVAL 2015 • Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, I{\~n}igo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe
1 code implementation • 5 Mar 2015 • Eneko Agirre, Ander Barrena, Aitor Soroa
Hyperlinks and other relations in Wikipedia are a extraordinary resource which is still not fully understood.
no code implementations • LREC 2012 • Eneko Agirre, Ander Barrena, Oier Lopez de Lacalle, Aitor Soroa, Fern, Samuel o, Mark Stevenson
Digitised Cultural Heritage (CH) items usually have short descriptions and lack rich contextual information.