Search Results for author: Aitor Soroa

Found 44 papers, 16 papers with code

Matching Cultural Heritage items to Wikipedia

no code implementations • LREC 2012 • Eneko Agirre, Ander Barrena, Oier Lopez de Lacalle, Aitor Soroa, Fern, Samuel o, Mark Stevenson

Digitised Cultural Heritage (CH) items usually have short descriptions and lack rich contextual information.

Entity Linking

Paper
Add Code

Comparing Taxonomies for Organising Collections of Documents

no code implementations • COLING 2012 • Fern, Samuel o, Mark Hall, Eneko Agirre, Aitor Soroa, Paul Clough, Mark Stevenson

Paper
Add Code

PATHS: A System for Accessing Cultural Heritage Collections

no code implementations • ACL 2013 • Eneko Agirre, Nikolaos Aletras, Paul Clough, Fern, Samuel o, Paula Goodale, Mark Hall, Aitor Soroa, Mark Stevenson

Language Modelling

Paper
Add Code

Random Walks for Knowledge-Based Word Sense Disambiguation

no code implementations • CL 2014 • Eneko Agirre, Oier L{\'o}pez de Lacalle, Aitor Soroa

Ranked #5 on Word Sense Disambiguation on Knowledge-based:

Information Retrieval Machine Translation +2

Paper
Add Code

A stream computing approach towards scalable NLP

no code implementations • LREC 2014 • Xabier Artola, Zuhaitz Beloki, Aitor Soroa

Computational power needs have grown dramatically in recent years.

Coreference Resolution Decision Making

Paper
Add Code

``One Entity per Discourse'' and ``One Entity per Collocation'' Improve Named-Entity Disambiguation

no code implementations • COLING 2014 • Ander Barrena, Eneko Agirre, Bernardo Cabaleiro, Anselmo Pe{\~n}as, Aitor Soroa

Entity Disambiguation Machine Translation +1

Paper
Add Code

Exploring the use of word embeddings and random walks on Wikipedia for the CogAlex shared task

no code implementations • WS 2014 • Josu Goikoetxea, Eneko Agirre, Aitor Soroa

Information Retrieval Natural Language Inference +3

Paper
Add Code

Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation

1 code implementation • 5 Mar 2015 • Eneko Agirre, Ander Barrena, Aitor Soroa

Hyperlinks and other relations in Wikipedia are a extraordinary resource which is still not fully understood.

Entity Disambiguation

105

Paper
Code

Random Walks and Neural Network Language Models on Knowledge Bases

no code implementations • HLT 2015 • Eneko Agirre, Aitor Soroa, Josu Goikoetxea

Paper
Add Code

Combining Mention Context and Hyperlinks from Wikipedia for Named Entity Disambiguation

no code implementations • SEMEVAL 2015 • Ander Barrena, Aitor Soroa, Eneko Agirre

Entity Disambiguation Entity Linking

Paper
Add Code

Improving distant supervision using inference learning

no code implementations • IJCNLP 2015 • Roland Roller, Eneko Agirre, Aitor Soroa, Mark Stevenson

Distant supervision is a widely applied approach to automatic training of relation extraction systems and has the advantage that it can generate large amounts of labelled data with minimal effort.

Relation Relation Extraction

Paper
Add Code

Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface

no code implementations • LREC 2016 • Talvany Carlotto, Zuhaitz Beloki, Xabier Artola, Aitor Soroa

That is often caused by the different linguistic formats used across the applications, which leads to attempts to both establish standard formats to represent linguistic information and to create conversion tools to facilitate this integration.

Paper
Add Code

Two Architectures for Parallel Processing of Huge Amounts of Text

no code implementations • LREC 2016 • Mathijs Kattenberg, Zuhaitz Beloki, Aitor Soroa, Xabier Artola, Antske Fokkens, Paul Huygen, Kees Verstoep

This paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallel processing.

Vocal Bursts Valence Prediction

Paper
Add Code

Alleviating Poor Context with Background Knowledge for Named Entity Disambiguation

no code implementations • ACL 2016 • Ander Barrena, Aitor Soroa, Eneko Agirre

Entity Disambiguation Entity Linking +1

Paper
Add Code

The risk of sub-optimal use of Open Source NLP Software: UKB is inadvertently state-of-the-art in knowledge-based WSD

no code implementations • WS 2018 • Eneko Agirre, Oier López de Lacalle, Aitor Soroa

UKB is an open source collection of programs for performing, among other tasks, knowledge-based Word Sense Disambiguation (WSD).

Word Sense Disambiguation

Paper
Add Code

Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset

no code implementations • 11 Sep 2018 • Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

In this paper we introduce vSTS, a new dataset for measuring textual similarity of sentences using multimodal information.

Semantic Textual Similarity Sentence +2

Paper
Add Code

Learning Text Representations for 500K Classification Tasks on Named Entity Disambiguation

1 code implementation • CONLL 2018 • Ander Barrena, Aitor Soroa, Eneko Agirre

Named Entity Disambiguation algorithms typically learn a single model for all target entities.

Data Augmentation Entity Disambiguation +5

Paper
Code

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

no code implementations • ACL 2019 • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Paper
Add Code

Give your Text Representation Models some Love: the Case for Basque

1 code implementation • LREC 2020 • Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

This is suboptimal as, for many languages, the models have been trained on smaller (or lower quality) corpora.

General Classification NER +6

Paper
Code

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

1 code implementation • 4 Apr 2020 • Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune, Eneko Agirre

In the case of textual representations, inference tasks such as Textual Entailment and Semantic Textual Similarity have been often used to benchmark the quality of textual representations.

Benchmarking Image Captioning +4

Paper
Code

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque

no code implementations • LREC 2020 • Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre

Conversational Question Answering (CQA) systems meet user information needs by having conversations with them, where answers to the questions are retrieved from text.

Conversational Question Answering Cross-Lingual Transfer

Paper
Add Code

DoQA -- Accessing Domain-Specific FAQs via Conversational QA

no code implementations • 4 May 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.

Conversational Question Answering Information Retrieval +2

Paper
Add Code

DoQA - Accessing Domain-Specific FAQs via Conversational QA

no code implementations • ACL 2020 • Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

We present DoQA, a dataset with 2, 437 dialogues and 10, 917 QA pairs.

Conversational Question Answering Information Retrieval +2

Paper
Add Code

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

1 code implementation • EMNLP 2020 • Jan Deriu, Don Tuggener, Pius von Däniken, Jon Ander Campos, Alvaro Rodrigo, Thiziri Belkacem, Aitor Soroa, Eneko Agirre, Mark Cieliebak

In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots.

Chatbot Survival Analysis

Paper
Code

Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning

1 code implementation • COLING 2020 • Jon Ander Campos, Kyunghyun Cho, Arantxa Otegi, Aitor Soroa, Gorka Azkune, Eneko Agirre

The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility.

Conversational Question Answering Document Classification

Paper
Code

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring

no code implementations • 31 Dec 2020 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2

Paper
Add Code

Inferring spatial relations from textual descriptions of images

1 code implementation • 1 Feb 2021 • Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre

Previous work did not use the caption text information, but a manually provided relation holding between the subject and the object.

Common Sense Reasoning Object +1

Paper
Code

Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring

no code implementations • ACL 2021 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2

Paper
Add Code

Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering

1 code implementation • 15 Sep 2021 • Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

Our results on a visual question answering task which requires external knowledge (OK-VQA) show that our text-only model outperforms pretrained multimodal (image-text) models of comparable number of parameters.

Image Captioning Knowledge Graphs +3

Paper
Code

Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources

no code implementations • 25 Jan 2022 • Angelina McMillan-Major, Zaid Alyafeai, Stella Biderman, Kimbo Chen, Francesco De Toni, Gérard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ilić, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Zeerak Talat, Daniel van Strien, Yacine Jernite

In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models.

Paper
Add Code

Does Corpus Quality Really Matter for Low-Resource Languages?

no code implementations • 15 Mar 2022 • Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.

Representation Learning

Paper
Add Code

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation

1 code implementation • 24 May 2022 • Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre

During inference, we build control codes for the desired meter and rhyme scheme, and condition our language model on them to generate formal verse poetry.

Language Modelling valid

Paper
Code

Principled Paraphrase Generation with Parallel Corpora

1 code implementation • ACL 2022 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.

Machine Translation Paraphrase Generation +1

Paper
Code

Noisy Channel for Automatic Text Simplification

no code implementations • 6 Nov 2022 • Oscar M Cumbicus-Pineda, Iker Gutiérrez-Fandiño, Itziar Gonzalez-Dios, Aitor Soroa

In this paper we present a simple re-ranking method for Automatic Sentence Simplification based on the noisy channel scheme.

Language Modelling Re-Ranking +2

Paper
Add Code

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

6 code implementations • 9 Nov 2022 • BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Language Modelling Multilingual NLP

2,183

Paper
Code

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

no code implementations • 7 Mar 2023 • Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gerard Dupont, Stella Biderman, Anna Rogers, Loubna Ben allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, Shayne Longpre, Sebastian Nagel, Leon Weber, Manuel Muñoz, Jian Zhu, Daniel van Strien, Zaid Alyafeai, Khalid Almubarak, Minh Chien Vu, Itziar Gonzalez-Dios, Aitor Soroa, Kyle Lo, Manan Dey, Pedro Ortiz Suarez, Aaron Gokaslan, Shamik Bose, David Adelani, Long Phan, Hieu Tran, Ian Yu, Suhas Pai, Jenny Chim, Violette Lepercq, Suzana Ilic, Margaret Mitchell, Sasha Alexandra Luccioni, Yacine Jernite

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings.

Ethics Language Modelling

Paper
Add Code

Do Multilingual Language Models Think Better in English?

1 code implementation • 2 Aug 2023 • Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe

In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models.

Common Sense Reasoning Cross-Lingual Natural Language Inference +6

Paper
Code

Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset

1 code implementation • 1 Mar 2024 • Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre, Frank Keller

We hypothesize that this is because explicit spatial relations rarely appear in the image captions used to train these models.

Image Captioning Text-to-Image Generation

Paper
Code

Latxa: An Open Language Model and Evaluation Suite for Basque

1 code implementation • 29 Mar 2024 • Julen Etxaniz, Oscar Sainz, Naiara Perez, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters.

Language Modelling Multiple-choice +1

Paper
Code

XNLIeu: a dataset for cross-lingual NLI in Basque

2 code implementations • 10 Apr 2024 • Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa

We have conducted a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition on the MT system; b) the best cross-lingual strategy for NLI in Basque; and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation.

Natural Language Inference Natural Language Understanding +1

131

Paper
Code

A Syntax-Aware Edit-based System for Text Simplification

no code implementations • RANLP 2021 • Oscar M. Cumbicus-Pineda, Itziar Gonzalez-Dios, Aitor Soroa

Edit-based text simplification systems have attained much attention in recent years due to their ability to produce simplification solutions that are interpretable, as well as requiring less training examples compared to traditional seq2seq systems.

Sentence Text Simplification

Paper
Add Code

Ontology Population Reusing Resources for Dialogue Intent Detection: Generic and Multilingual Approach

no code implementations • RANLP 2021 • Cristina Aceta, Izaskun Fernández, Aitor Soroa

This work presents a generic semi-automatic strategy to populate the domain ontology of an ontology-driven task-oriented dialogue system, with the aim of performing successful intent detection in the dialogue process, reusing already existing multilingual resources.

Intent Detection Management

Paper
Add Code

BasqueGLUE: A Natural Language Understanding Benchmark for Basque

1 code implementation • LREC 2022 • Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Natural Language Understanding (NLU) technology has improved significantly over the last few years and multitask benchmarks such as GLUE are key to evaluate this improvement in a robust and general way.

Natural Language Understanding

Paper
Code

Automatic Evaluation vs. User Preference in Neural Textual QuestionAnswering over COVID-19 Scientific Literature

no code implementations • EMNLP (NLP-COVID19) 2020 • Arantxa Otegi, Jon Ander Campos, Gorka Azkune, Aitor Soroa, Eneko Agirre

In this paper we present a quantitative and qualitative analysis of the system.

Information Retrieval Question Answering +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.