no code implementations • FEVER (ACL) 2022 • Blanca Calvo Figueras, Montse Oller, Rodrigo Agerri
The influence of fake news in the perception of reality has become a mainstream topic in the last years due to the fast propagation of misleading information.
1 code implementation • LREC 2022 • Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa
Natural Language Understanding (NLU) technology has improved significantly over the last few years and multitask benchmarks such as GLUE are key to evaluate this improvement in a robust and general way.
1 code implementation • Findings (EMNLP) 2021 • Iker García-Ferrero, Rodrigo Agerri, German Rigau
In the last few years, several methods have been proposed to build meta-embeddings.
no code implementations • SemEval (NAACL) 2022 • Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal
In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity.
no code implementations • 1 Dec 2023 • Iakes Goenaga, Aitziber Atutxa, Koldo Gojenola, Maite Oronoz, Rodrigo Agerri
Comprehensive experimentation with language models for Spanish shows that sometimes multilingual models fare better than monolingual ones, even outperforming models which have been adapted to the medical domain.
no code implementations • 20 Nov 2023 • Maxime Masson, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose
Extensive experimentation on a newly collected and annotated multilingual (French, English, and Spanish) dataset composed of tourism-related tweets shows that current few-shot learning techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 10% of the tweets for location detection (around 160) and 13% (200 approx.)
1 code implementation • 5 Oct 2023 • Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre
Previous attempts to leverage such information have failed, even with the largest models, as they are not able to follow the guidelines out-of-the-box.
Ranked #1 on
Zero-shot Named Entity Recognition (NER)
on HarveyNER
(using extra training data)
no code implementations • 9 Jun 2023 • Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau, Anar Yeginbergenova
Providing high quality explanations for AI predictions based on machine learning is a challenging and complex task.
1 code implementation • 27 Apr 2023 • Nayla Escribano, German Rigau, Rodrigo Agerri
Detecting and normalizing temporal expressions is an essential step for many NLP tasks.
no code implementations • 1 Feb 2023 • Olia Toporkov, Rodrigo Agerri
Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance.
no code implementations • 25 Jan 2023 • Anar Yeginbergenova, Rodrigo Agerri
Nowadays the medical domain is receiving more and more attention in applications involving Artificial Intelligence.
2 code implementations • 20 Dec 2022 • Iker García-Ferrero, Rodrigo Agerri, German Rigau
In the absence of readily available labeled data for a given sequence labeling task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data.
Ranked #1 on
Cross-Lingual NER
on MasakhaNER2.0
(Hausa metric)
1 code implementation • 16 Dec 2022 • Rodrigo Agerri, Eneko Agirre
Given the impact of language models on the field of Natural Language Processing, a number of Spanish encoder-only masked language models (aka BERTs) have been trained and released.
4 code implementations • 23 Oct 2022 • Iker García-Ferrero, Rodrigo Agerri, German Rigau
Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages.
Ranked #1 on
Cross-Lingual NER
on CoNLL 2003
no code implementations • 19 Oct 2022 • Elisa Sanchez-Bayona, Rodrigo Agerri
The lack of wide coverage datasets annotated with everyday metaphorical expressions for languages other than English is striking.
no code implementations • 11 Oct 2022 • Joseba Fernandez de Landa, Rodrigo Agerri
The large majority of the research performed on stance detection has been focused on developing more or less sophisticated text classification systems, even when many benchmarks are based on social network data such as Twitter.
1 code implementation • LREC 2022 • Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri
Parliamentary transcripts provide a valuable resource to understand the reality and know about the most important facts that occur over time in our societies.
no code implementations • 15 Mar 2022 • Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa
For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.
1 code implementation • EMNLP (ArgMining) 2021 • Yi-Ling Chung, Marco Guerini, Rodrigo Agerri
The growing interest in employing counter narratives for hatred intervention brings with it a focus on dataset creation and automation strategies.
no code implementations • 28 Jan 2021 • Elena Zotova, Rodrigo Agerri, German Rigau
While interactions in social media such as Twitter occur in many natural languages, research on stance detection (the position or attitude expressed with respect to a specific topic) within the Natural Language Processing field has largely been done for English.
no code implementations • LREC 2020 • Elena Zotova, Rodrigo Agerri, Manuel Nu{\~n}ez, German Rigau
The TW-10 referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish.
1 code implementation • LREC 2020 • Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre
This is suboptimal as, for many languages, the models have been trained on smaller (or lower quality) corpora.
1 code implementation • 31 Mar 2020 • Elena Zotova, Rodrigo Agerri, Manuel Nuñez, German Rigau
The TW-10 Referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish.
2 code implementations • 17 Jan 2020 • Iker García-Ferrero, Rodrigo Agerri, German Rigau
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
no code implementations • SEMEVAL 2019 • Rodrigo Agerri
In this paper we describe our participation to the Hyperpartisan News Detection shared task at SemEval 2019.
no code implementations • 28 Jan 2019 • Rodrigo Agerri, German Rigau
In this research note we present a language independent system to model Opinion Target Extraction (OTE) as a sequence labelling task.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+1
no code implementations • 28 Sep 2018 • Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri
Crawled data is processed by means of the EliXa Sentiment Analysis system.
no code implementations • SEMEVAL 2015 • Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri
This paper presents a supervised Aspect Based Sentiment Analysis (ABSA) system.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+1
no code implementations • 6 Feb 2017 • Iñaki San Vicente, Rodrigo Agerri, German Rigau
This paper presents a simple, robust and (almost) unsupervised dictionary-based method, qwn-ppv (Q-WordNet as Personalized PageRanking Vector) to automatically generate polarity lexicons.
no code implementations • 2 Feb 2017 • Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau
In this paper we present an approach to extract ordered timelines of events, their participants, locations and times from a set of multilingual and cross-lingual data sources.
1 code implementation • 31 Jan 2017 • Rodrigo Agerri, German Rigau
Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models.
Ranked #63 on
Named Entity Recognition (NER)
on CoNLL 2003 (English)
no code implementations • LREC 2014 • Rodrigo Agerri, Josu Bermudez, German Rigau
IXA pipeline is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology.
no code implementations • LREC 2014 • Isa Maks, Ruben Izquierdo, Francesca Frontini, Rodrigo Agerri, Piek Vossen, Andoni Azpeitia
In this paper we focus on the creation of general-purpose (as opposed to domain-specific) polarity lexicons in five languages: French, Italian, Dutch, English and Spanish using WordNet propagation.
no code implementations • LREC 2012 • Volha Petukhova, Rodrigo Agerri, Mark Fishel, Sergio Penkale, Arantza del Pozo, Mirjam Sepesy Mau{\v{c}}ec, Andy Way, Panayota Georgakopoulou, Martin Volk
Subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase efficiency of subtitle production process.