no code implementations • RANLP 2021 • Oscar M. Cumbicus-Pineda, Itziar Gonzalez-Dios, Aitor Soroa
Edit-based text simplification systems have attained much attention in recent years due to their ability to produce simplification solutions that are interpretable, as well as requiring less training examples compared to traditional seq2seq systems.
1 code implementation • ACL 2022 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.
no code implementations • RANLP 2021 • Cristina Aceta, Izaskun Fernández, Aitor Soroa
This work presents a generic semi-automatic strategy to populate the domain ontology of an ontology-driven task-oriented dialogue system, with the aim of performing successful intent detection in the dialogue process, reusing already existing multilingual resources.
no code implementations • EMNLP (NLP-COVID19) 2020 • Arantxa Otegi, Jon Ander Campos, Gorka Azkune, Aitor Soroa, Eneko Agirre
In this paper we present a quantitative and qualitative analysis of the system.
no code implementations • 15 Mar 2022 • Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa
The vast majority of non-English corpora are derived from automatically filtered versions of CommonCrawl.
no code implementations • 25 Jan 2022 • Angelina McMillan-Major, Zaid Alyafeai, Stella Biderman, Kimbo Chen, Francesco De Toni, Gérard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ilić, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Zeerak Talat, Daniel van Strien, Yacine Jernite
In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models.
1 code implementation • 15 Sep 2021 • Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
Our results on a visual question answering task which requires external knowledge (OK-VQA) show that our text-only model outperforms pretrained multimodal (image-text) models of comparable number of parameters.
no code implementations • ACL 2021 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction
Cross-Lingual Word Embeddings
+2
1 code implementation • 1 Feb 2021 • Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre
Previous work did not use the caption text information, but a manually provided relation holding between the subject and the object.
no code implementations • 31 Dec 2020 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction
Cross-Lingual Word Embeddings
+2
1 code implementation • COLING 2020 • Jon Ander Campos, Kyunghyun Cho, Arantxa Otegi, Aitor Soroa, Gorka Azkune, Eneko Agirre
The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility.
1 code implementation • EMNLP 2020 • Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak
In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots.
no code implementations • ACL 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.
no code implementations • 4 May 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre
We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.
no code implementations • LREC 2020 • Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre
Conversational Question Answering (CQA) systems meet user information needs by having conversations with them, where answers to the questions are retrieved from text.
1 code implementation • 4 Apr 2020 • Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune, Eneko Agirre
In the case of textual representations, inference tasks such as Textual Entailment and Semantic Textual Similarity have been often used to benchmark the quality of textual representations.
1 code implementation • LREC 2020 • Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre
This is suboptimal as, for many languages, the models have been trained on smaller (or lower quality) corpora.
no code implementations • ACL 2019 • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre
Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.
Bilingual Lexicon Induction
Cross-Lingual Word Embeddings
+1
1 code implementation • CONLL 2018 • Ander Barrena, Aitor Soroa, Eneko Agirre
Named Entity Disambiguation algorithms typically learn a single model for all target entities.
no code implementations • 11 Sep 2018 • Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre
In this paper we introduce vSTS, a new dataset for measuring textual similarity of sentences using multimodal information.
no code implementations • WS 2018 • Eneko Agirre, Oier López de Lacalle, Aitor Soroa
UKB is an open source collection of programs for performing, among other tasks, knowledge-based Word Sense Disambiguation (WSD).
no code implementations • LREC 2016 • Talvany Carlotto, Zuhaitz Beloki, Xabier Artola, Aitor Soroa
That is often caused by the different linguistic formats used across the applications, which leads to attempts to both establish standard formats to represent linguistic information and to create conversion tools to facilitate this integration.
no code implementations • LREC 2016 • Mathijs Kattenberg, Zuhaitz Beloki, Aitor Soroa, Xabier Artola, Antske Fokkens, Paul Huygen, Kees Verstoep
This paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallel processing.
no code implementations • IJCNLP 2015 • Roland Roller, Eneko Agirre, Aitor Soroa, Mark Stevenson
Distant supervision is a widely applied approach to automatic training of relation extraction systems and has the advantage that it can generate large amounts of labelled data with minimal effort.
1 code implementation • 5 Mar 2015 • Eneko Agirre, Ander Barrena, Aitor Soroa
Hyperlinks and other relations in Wikipedia are a extraordinary resource which is still not fully understood.
no code implementations • LREC 2014 • Xabier Artola, Zuhaitz Beloki, Aitor Soroa
Computational power needs have grown dramatically in recent years.
no code implementations • LREC 2012 • Eneko Agirre, Ander Barrena, Oier Lopez de Lacalle, Aitor Soroa, Fern, Samuel o, Mark Stevenson
Digitised Cultural Heritage (CH) items usually have short descriptions and lack rich contextual information.