Search Results for author: Rodrigo Agerri

Found 44 papers, 16 papers with code

BasqueGLUE: A Natural Language Understanding Benchmark for Basque

1 code implementation • LREC 2022 • Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Natural Language Understanding (NLU) technology has improved significantly over the last few years and multitask benchmarks such as GLUE are key to evaluate this improvement in a robust and general way.

Natural Language Understanding

Paper
Code

A Semantics-Aware Approach to Automated Claim Verification

no code implementations • FEVER (ACL) 2022 • Blanca Calvo Figueras, Montse Oller, Rodrigo Agerri

The influence of fake news in the perception of reality has become a mainstream topic in the last years due to the fast propagation of misleading information.

Claim Verification Fact Checking +1

Paper
Add Code

SemEval 2022 Task 10: Structured Sentiment Analysis

no code implementations • SemEval (NAACL) 2022 • Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity.

Sentiment Analysis

Paper
Add Code

Benchmarking Meta-embeddings: What Works and What Does Not

1 code implementation • Findings (EMNLP) 2021 • Iker García-Ferrero, Rodrigo Agerri, German Rigau

In the last few years, several methods have been proposed to build meta-embeddings.

Benchmarking Embeddings Evaluation

Paper
Code

Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

no code implementations • 11 Apr 2024 • Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa Salazar, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata, Andrea Zaninello

While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly).

Natural Language Understanding

Paper
Add Code

Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation

1 code implementation • 10 Apr 2024 • Elisa Sanchez-Bayona, Rodrigo Agerri

Metaphors, although occasionally unperceived, are ubiquitous in our everyday language.

Paper
Code

MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering

no code implementations • 8 Apr 2024 • Iñigo Alonso, Maite Oronoz, Rodrigo Agerri

So far the benchmark is available in four languages, but we hope that this work may encourage further development to other languages.

Benchmarking Question Answering

Paper
Add Code

Evaluating Shortest Edit Script Methods for Contextual Lemmatization

1 code implementation • 25 Mar 2024 • Olia Toporkov, Rodrigo Agerri

We experiment with seven languages of different morphological complexity, namely, English, Spanish, Basque, Russian, Czech, Turkish and Polish, using multilingual and language-specific pre-trained masked language encoder-only models as a backbone to build our lemmatizers.

LEMMA Lemmatization +2

Paper
Code

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

no code implementations • 14 Mar 2024 • Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Being a parallel corpus, also with respect to the original English CONAN, it allows to perform novel research on multilingual and crosslingual automatic generation of CNs.

Data Augmentation Machine Translation

Paper
Add Code

Explanatory Argument Extraction of Correct Answers in Resident Medical Exams

no code implementations • 1 Dec 2023 • Iakes Goenaga, Aitziber Atutxa, Koldo Gojenola, Maite Oronoz, Rodrigo Agerri

Comprehensive experimentation with language models for Spanish shows that sometimes multilingual models fare better than monolingual ones, even outperforming models which have been adapted to the medical domain.

Multiple-choice

Paper
Add Code

Optimal Strategies to Perform Multilingual Analysis of Social Content for a Novel Dataset in the Tourism Domain

no code implementations • 20 Nov 2023 • Maxime Masson, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose

Extensive experimentation on a newly collected and annotated multilingual (French, English, and Spanish) dataset composed of tourism-related tweets shows that current few-shot learning techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 10% of the tweets for location detection (around 160) and 13% (200 approx.)

Few-Shot Learning named-entity-recognition +2

Paper
Add Code

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction

1 code implementation • 5 Oct 2023 • Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

In this paper, we propose GoLLIE (Guideline-following Large Language Model for IE), a model able to improve zero-shot results on unseen IE tasks by virtue of being fine-tuned to comply with annotation guidelines.

Ranked #1 on Zero-shot Named Entity Recognition (NER) on HarveyNER (using extra training data)

Event Argument Extraction Language Modelling +6

205

Paper
Code

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine

no code implementations • 9 Jun 2023 • Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau, Anar Yeginbergenova

Providing high quality explanations for AI predictions based on machine learning is a challenging and complex task.

Decision Making Explainable artificial intelligence +1

Paper
Add Code

A Modular Approach for Multilingual Timex Detection and Normalization using Deep Learning and Grammar-based methods

1 code implementation • 27 Apr 2023 • Nayla Escribano, German Rigau, Rodrigo Agerri

Detecting and normalizing temporal expressions is an essential step for many NLP tasks.

Language Modelling Timex normalization

Paper
Code

On the Role of Morphological Information for Contextual Lemmatization

no code implementations • 1 Feb 2023 • Olia Toporkov, Rodrigo Agerri

Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance.

LEMMA Lemmatization

Paper
Add Code

Cross-lingual Argument Mining in the Medical Domain

no code implementations • 25 Jan 2023 • Anar Yeginbergen, Rodrigo Agerri

Nowadays the medical domain is receiving more and more attention in applications involving Artificial Intelligence as clinicians decision-making is increasingly dependent on dealing with enormous amounts of unstructured textual data.

Argument Mining Data Augmentation +2

Paper
Add Code

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks

2 code implementations • 20 Dec 2022 • Iker García-Ferrero, Rodrigo Agerri, German Rigau

In the absence of readily available labeled data for a given sequence labeling task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data.

Ranked #1 on Cross-Lingual NER on MasakhaNER2.0 (Hausa metric)

Cross-Lingual NER Machine Translation +2

Paper
Code

Lessons learned from the evaluation of Spanish Language Models

1 code implementation • 16 Dec 2022 • Rodrigo Agerri, Eneko Agirre

Given the impact of language models on the field of Natural Language Processing, a number of Spanish encoder-only masked language models (aka BERTs) have been trained and released.

Paper
Code

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

4 code implementations • 23 Oct 2022 • Iker García-Ferrero, Rodrigo Agerri, German Rigau

Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages.

Ranked #1 on Cross-Lingual NER on CoNLL Spanish

Cross-Lingual NER Machine Translation +1

159

Paper
Code

Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection

no code implementations • 19 Oct 2022 • Elisa Sanchez-Bayona, Rodrigo Agerri

The lack of wide coverage datasets annotated with everyday metaphorical expressions for languages other than English is striking.

Paper
Add Code

Relational Embeddings for Language Independent Stance Detection

no code implementations • 11 Oct 2022 • Joseba Fernandez de Landa, Rodrigo Agerri

The large majority of the research performed on stance detection has been focused on developing more or less sophisticated text classification systems, even when many benchmarks are based on social network data such as Twitter.

Stance Detection text-classification +1

Paper
Add Code

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions

1 code implementation • LREC 2022 • Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

Parliamentary transcripts provide a valuable resource to understand the reality and know about the most important facts that occur over time in our societies.

Paper
Code

Does Corpus Quality Really Matter for Low-Resource Languages?

no code implementations • 15 Mar 2022 • Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.

Representation Learning

Paper
Add Code

Multilingual Counter Narrative Type Classification

1 code implementation • EMNLP (ArgMining) 2021 • Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

The growing interest in employing counter narratives for hatred intervention brings with it a focus on dataset creation and automation strategies.

Classification Vocal Bursts Type Prediction

Paper
Code

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

no code implementations • 28 Jan 2021 • Elena Zotova, Rodrigo Agerri, German Rigau

While interactions in social media such as Twitter occur in many natural languages, research on stance detection (the position or attitude expressed with respect to a specific topic) within the Natural Language Processing field has largely been done for English.

Stance Detection

Paper
Add Code

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus

no code implementations • LREC 2020 • Elena Zotova, Rodrigo Agerri, Manuel Nu{\~n}ez, German Rigau

The TW-10 referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish.

Stance Detection

Paper
Add Code

Give your Text Representation Models some Love: the Case for Basque

1 code implementation • LREC 2020 • Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

This is suboptimal as, for many languages, the models have been trained on smaller (or lower quality) corpora.

General Classification NER +6

Paper
Code

Multilingual Stance Detection: The Catalonia Independence Corpus

1 code implementation • 31 Mar 2020 • Elena Zotova, Rodrigo Agerri, Manuel Nuñez, German Rigau

The TW-10 Referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish.

Stance Detection

Paper
Code

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

2 code implementations • 17 Jan 2020 • Iker García-Ferrero, Rodrigo Agerri, German Rigau

This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.

Cross-Lingual Transfer POS +5

Paper
Code

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features

no code implementations • SEMEVAL 2019 • Rodrigo Agerri

In this paper we describe our participation to the Hyperpartisan News Detection shared task at SemEval 2019.

Clustering Document Classification +1

Paper
Add Code

Language Independent Sequence Labelling for Opinion Target Extraction

no code implementations • 28 Jan 2019 • Rodrigo Agerri, German Rigau

In this research note we present a language independent system to model Opinion Target Extraction (OTE) as a sequence labelling task.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Paper
Add Code

Real Time Monitoring of Social Media and Digital Press

no code implementations • 28 Sep 2018 • Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri

Crawled data is processed by means of the EliXa Sentiment Analysis system.

Data Visualization Domain Adaptation +1

Paper
Add Code

Developing New Linguistic Resources and Tools for the Galician Language

no code implementations • LREC 2018 • Rodrigo Agerri, Xavier G{\'o}mez Guinovart, German Rigau, Miguel Anxo Solla Portela

Lemmatization Named Entity Recognition (NER) +1

Paper
Add Code

Building Named Entity Recognition Taggers via Parallel Corpora

1 code implementation • LREC 2018 • Rodrigo Agerri, Yi-Ling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Machine Translation named-entity-recognition +4

Paper
Code

Annotating Abstract Meaning Representations for Spanish

no code implementations • LREC 2018 • Noelia Migueles-Abraira, Rodrigo Agerri, Arantza Diaz de Ilarraza

Paper
Add Code

EliXa: A Modular and Flexible ABSA Platform

no code implementations • SEMEVAL 2015 • Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri

This paper presents a supervised Aspect Based Sentiment Analysis (ABSA) system.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Paper
Add Code

Q-WordNet PPV: Simple, Robust and (almost) Unsupervised Generation of Polarity Lexicons for Multiple Languages

no code implementations • 6 Feb 2017 • Iñaki San Vicente, Rodrigo Agerri, German Rigau

This paper presents a simple, robust and (almost) unsupervised dictionary-based method, qwn-ppv (Q-WordNet as Personalized PageRanking Vector) to automatically generate polarity lexicons.

Sentiment Analysis

Paper
Add Code

Multilingual and Cross-lingual Timeline Extraction

no code implementations • 2 Feb 2017 • Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

In this paper we present an approach to extract ordered timelines of events, their participants, locations and times from a set of multilingual and cross-lingual data sources.

Paper
Add Code

Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features

1 code implementation • 31 Jan 2017 • Rodrigo Agerri, German Rigau

Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models.

Ranked #62 on Named Entity Recognition (NER) on CoNLL 2003 (English)

Clustering Multilingual Named Entity Recognition +2

Paper
Code

IXA pipeline: Efficient and Ready to Use Multilingual NLP tools

no code implementations • LREC 2014 • Rodrigo Agerri, Josu Bermudez, German Rigau

IXA pipeline is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology.

Coreference Resolution Multilingual NLP +2

Paper
Add Code

Generating Polarity Lexicons with WordNet propagation in 5 languages

no code implementations • LREC 2014 • Isa Maks, Ruben Izquierdo, Francesca Frontini, Rodrigo Agerri, Piek Vossen, Andoni Azpeitia

In this paper we focus on the creation of general-purpose (as opposed to domain-specific) polarity lexicons in five languages: French, Italian, Dutch, English and Spanish using WordNet propagation.

Named Entity Recognition (NER) Opinion Mining +1

Paper
Add Code

Simple, Robust and (almost) Unsupervised Generation of Polarity Lexicons for Multiple Languages

no code implementations • EACL 2014 • I{\~n}aki San Vicente, Rodrigo Agerri, German Rigau

Opinion Mining Sentiment Analysis

Paper
Add Code

Multilingual, Efficient and Easy NLP Processing with IXA Pipeline

no code implementations • EACL 2014 • Rodrigo Agerri, Josu Bermudez, German Rigau

Coreference Resolution Named Entity Recognition (NER) +1

Paper
Add Code

SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles

no code implementations • LREC 2012 • Volha Petukhova, Rodrigo Agerri, Mark Fishel, Sergio Penkale, Arantza del Pozo, Mirjam Sepesy Mau{\v{c}}ec, Andy Way, Panayota Georgakopoulou, Martin Volk

Subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase efficiency of subtitle production process.

Machine Translation Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.