Search Results for author: Aitor Gonzalez-Agirre

Found 17 papers, 2 papers with code

Spanish Legalese Language Model and Corpora

1 code implementation23 Oct 2021 Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Aitor Gonzalez-Agirre, Marta Villegas

There are many Language Models for the English language according to its worldwide relevance.

Language Modelling

Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models

no code implementations16 Sep 2021 Casimiro Pio Carrino, Jordi Armengol-Estapé, Ona de Gibert Bonet, Asier Gutiérrez-Fandiño, Aitor Gonzalez-Agirre, Martin Krallinger, Marta Villegas

We introduce CoWeSe (the Corpus Web Salud Espa\~nol), the largest Spanish biomedical corpus to date, consisting of 4. 5GB (about 750M tokens) of clean plain text.

Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario

no code implementations8 Sep 2021 Casimiro Pio Carrino, Jordi Armengol-Estapé, Asier Gutiérrez-Fandiño, Joan Llop-Palao, Marc Pàmies, Aitor Gonzalez-Agirre, Marta Villegas

To the best of our knowledge, we provide the first biomedical and clinical transformer-based pretrained language models for Spanish, intending to boost native Spanish NLP applications in biomedicine.

Named Entity Recognition NER

Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan

no code implementations Findings (ACL) 2021 Jordi Armengol-Estapé, Casimiro Pio Carrino, Carlos Rodriguez-Penagos, Ona de Gibert Bonet, Carme Armentano-Oller, Aitor Gonzalez-Agirre, Maite Melero, Marta Villegas

For this, we: (1) build a clean, high-quality textual Catalan corpus (CaText), the largest to date (but only a fraction of the usual size of the previous work in monolingual language models), (2) train a Transformer-based language model for Catalan (BERTa), and (3) devise a thorough evaluation in a diversity of settings, comprising a complete array of downstream tasks, namely, Part of Speech Tagging, Named Entity Recognition and Classification, Text Classification, Question Answering, and Semantic Textual Similarity, with most of the corresponding datasets being created ex novo.

Language Modelling Language understanding +5

PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track

no code implementations WS 2019 Aitor Gonzalez-Agirre, Montserrat Marimon, Ander Intxaurrondo, Obdulia Rabal, Marta Villegas, Martin Krallinger

We foresee that the PharmaCoNER annotation guidelines, corpus and participant systems will foster the development of new resources for clinical and biomedical text mining systems of Spanish medical data.

Named Entity Recognition

Medical Word Embeddings for Spanish: Development and Evaluation

no code implementations WS 2019 Felipe Soares, Marta Villegas, Aitor Gonzalez-Agirre, Martin Krallinger, Jordi Armengol-Estap{\'e}

We performed intrinsic evaluation with our adapted datasets, as well as extrinsic evaluation with a named entity recognition systems using a baseline embedding of general-domain.

Named Entity Recognition Word Embeddings

Multilingual Central Repository version 3.0

no code implementations LREC 2012 Aitor Gonzalez-Agirre, Egoitz Laparra, German Rigau

This paper describes the upgrading process of the Multilingual Central Repository (MCR).

Cannot find the paper you are looking for? You can Submit a new open access paper.