Search Results for author: Rodrigo Agerri

Found 44 papers, 16 papers with code

BasqueGLUE: A Natural Language Understanding Benchmark for Basque

1 code implementation LREC 2022 Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Natural Language Understanding (NLU) technology has improved significantly over the last few years and multitask benchmarks such as GLUE are key to evaluate this improvement in a robust and general way.

Natural Language Understanding

A Semantics-Aware Approach to Automated Claim Verification

no code implementations FEVER (ACL) 2022 Blanca Calvo Figueras, Montse Oller, Rodrigo Agerri

The influence of fake news in the perception of reality has become a mainstream topic in the last years due to the fast propagation of misleading information.

Claim Verification Fact Checking +1

SemEval 2022 Task 10: Structured Sentiment Analysis

no code implementations SemEval (NAACL) 2022 Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity.

Sentiment Analysis

Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation

1 code implementation10 Apr 2024 Elisa Sanchez-Bayona, Rodrigo Agerri

Metaphors, although occasionally unperceived, are ubiquitous in our everyday language.

MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering

no code implementations8 Apr 2024 Iñigo Alonso, Maite Oronoz, Rodrigo Agerri

So far the benchmark is available in four languages, but we hope that this work may encourage further development to other languages.

Benchmarking Question Answering

Evaluating Shortest Edit Script Methods for Contextual Lemmatization

1 code implementation25 Mar 2024 Olia Toporkov, Rodrigo Agerri

We experiment with seven languages of different morphological complexity, namely, English, Spanish, Basque, Russian, Czech, Turkish and Polish, using multilingual and language-specific pre-trained masked language encoder-only models as a backbone to build our lemmatizers.

LEMMA Lemmatization +2

Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation

no code implementations14 Mar 2024 Jaione Bengoetxea, Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Being a parallel corpus, also with respect to the original English CONAN, it allows to perform novel research on multilingual and crosslingual automatic generation of CNs.

Data Augmentation Machine Translation

Explanatory Argument Extraction of Correct Answers in Resident Medical Exams

no code implementations1 Dec 2023 Iakes Goenaga, Aitziber Atutxa, Koldo Gojenola, Maite Oronoz, Rodrigo Agerri

Comprehensive experimentation with language models for Spanish shows that sometimes multilingual models fare better than monolingual ones, even outperforming models which have been adapted to the medical domain.

Multiple-choice

Optimal Strategies to Perform Multilingual Analysis of Social Content for a Novel Dataset in the Tourism Domain

no code implementations20 Nov 2023 Maxime Masson, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose

Extensive experimentation on a newly collected and annotated multilingual (French, English, and Spanish) dataset composed of tourism-related tweets shows that current few-shot learning techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 10% of the tweets for location detection (around 160) and 13% (200 approx.)

Few-Shot Learning named-entity-recognition +2

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction

1 code implementation5 Oct 2023 Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

In this paper, we propose GoLLIE (Guideline-following Large Language Model for IE), a model able to improve zero-shot results on unseen IE tasks by virtue of being fine-tuned to comply with annotation guidelines.

 Ranked #1 on Zero-shot Named Entity Recognition (NER) on HarveyNER (using extra training data)

Event Argument Extraction Language Modelling +6

On the Role of Morphological Information for Contextual Lemmatization

no code implementations1 Feb 2023 Olia Toporkov, Rodrigo Agerri

Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance.

LEMMA Lemmatization

Cross-lingual Argument Mining in the Medical Domain

no code implementations25 Jan 2023 Anar Yeginbergen, Rodrigo Agerri

Nowadays the medical domain is receiving more and more attention in applications involving Artificial Intelligence as clinicians decision-making is increasingly dependent on dealing with enormous amounts of unstructured textual data.

Argument Mining Data Augmentation +2

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks

2 code implementations20 Dec 2022 Iker García-Ferrero, Rodrigo Agerri, German Rigau

In the absence of readily available labeled data for a given sequence labeling task and language, annotation projection has been proposed as one of the possible strategies to automatically generate annotated data.

 Ranked #1 on Cross-Lingual NER on MasakhaNER2.0 (Hausa metric)

Cross-Lingual NER Machine Translation +2

Lessons learned from the evaluation of Spanish Language Models

1 code implementation16 Dec 2022 Rodrigo Agerri, Eneko Agirre

Given the impact of language models on the field of Natural Language Processing, a number of Spanish encoder-only masked language models (aka BERTs) have been trained and released.

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

4 code implementations23 Oct 2022 Iker García-Ferrero, Rodrigo Agerri, German Rigau

Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages.

Cross-Lingual NER Machine Translation +1

Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection

no code implementations19 Oct 2022 Elisa Sanchez-Bayona, Rodrigo Agerri

The lack of wide coverage datasets annotated with everyday metaphorical expressions for languages other than English is striking.

Relational Embeddings for Language Independent Stance Detection

no code implementations11 Oct 2022 Joseba Fernandez de Landa, Rodrigo Agerri

The large majority of the research performed on stance detection has been focused on developing more or less sophisticated text classification systems, even when many benchmarks are based on social network data such as Twitter.

Stance Detection text-classification +1

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions

1 code implementation LREC 2022 Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

Parliamentary transcripts provide a valuable resource to understand the reality and know about the most important facts that occur over time in our societies.

Does Corpus Quality Really Matter for Low-Resource Languages?

no code implementations15 Mar 2022 Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.

Representation Learning

Multilingual Counter Narrative Type Classification

1 code implementation EMNLP (ArgMining) 2021 Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

The growing interest in employing counter narratives for hatred intervention brings with it a focus on dataset creation and automation strategies.

Classification Vocal Bursts Type Prediction

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

no code implementations28 Jan 2021 Elena Zotova, Rodrigo Agerri, German Rigau

While interactions in social media such as Twitter occur in many natural languages, research on stance detection (the position or attitude expressed with respect to a specific topic) within the Natural Language Processing field has largely been done for English.

Stance Detection

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus

no code implementations LREC 2020 Elena Zotova, Rodrigo Agerri, Manuel Nu{\~n}ez, German Rigau

The TW-10 referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish.

Stance Detection

Multilingual Stance Detection: The Catalonia Independence Corpus

1 code implementation31 Mar 2020 Elena Zotova, Rodrigo Agerri, Manuel Nuñez, German Rigau

The TW-10 Referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish.

Stance Detection

Language Independent Sequence Labelling for Opinion Target Extraction

no code implementations28 Jan 2019 Rodrigo Agerri, German Rigau

In this research note we present a language independent system to model Opinion Target Extraction (OTE) as a sequence labelling task.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Q-WordNet PPV: Simple, Robust and (almost) Unsupervised Generation of Polarity Lexicons for Multiple Languages

no code implementations6 Feb 2017 Iñaki San Vicente, Rodrigo Agerri, German Rigau

This paper presents a simple, robust and (almost) unsupervised dictionary-based method, qwn-ppv (Q-WordNet as Personalized PageRanking Vector) to automatically generate polarity lexicons.

Sentiment Analysis

Multilingual and Cross-lingual Timeline Extraction

no code implementations2 Feb 2017 Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

In this paper we present an approach to extract ordered timelines of events, their participants, locations and times from a set of multilingual and cross-lingual data sources.

IXA pipeline: Efficient and Ready to Use Multilingual NLP tools

no code implementations LREC 2014 Rodrigo Agerri, Josu Bermudez, German Rigau

IXA pipeline is a modular set of Natural Language Processing tools (or pipes) which provide easy access to NLP technology.

Coreference Resolution Multilingual NLP +2

Generating Polarity Lexicons with WordNet propagation in 5 languages

no code implementations LREC 2014 Isa Maks, Ruben Izquierdo, Francesca Frontini, Rodrigo Agerri, Piek Vossen, Andoni Azpeitia

In this paper we focus on the creation of general-purpose (as opposed to domain-specific) polarity lexicons in five languages: French, Italian, Dutch, English and Spanish using WordNet propagation.

Named Entity Recognition (NER) Opinion Mining +1

SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles

no code implementations LREC 2012 Volha Petukhova, Rodrigo Agerri, Mark Fishel, Sergio Penkale, Arantza del Pozo, Mirjam Sepesy Mau{\v{c}}ec, Andy Way, Panayota Georgakopoulou, Martin Volk

Subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase efficiency of subtitle production process.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.