Search Results for author: Aitor Soroa

Found 44 papers, 16 papers with code

Matching Cultural Heritage items to Wikipedia

no code implementations LREC 2012 Eneko Agirre, Ander Barrena, Oier Lopez de Lacalle, Aitor Soroa, Fern, Samuel o, Mark Stevenson

Digitised Cultural Heritage (CH) items usually have short descriptions and lack rich contextual information.

Entity Linking

Studying the Wikipedia Hyperlink Graph for Relatedness and Disambiguation

1 code implementation5 Mar 2015 Eneko Agirre, Ander Barrena, Aitor Soroa

Hyperlinks and other relations in Wikipedia are a extraordinary resource which is still not fully understood.

Entity Disambiguation

Improving distant supervision using inference learning

no code implementations IJCNLP 2015 Roland Roller, Eneko Agirre, Aitor Soroa, Mark Stevenson

Distant supervision is a widely applied approach to automatic training of relation extraction systems and has the advantage that it can generate large amounts of labelled data with minimal effort.

Relation Relation Extraction

Interoperability of Annotation Schemes: Using the Pepper Framework to Display AWA Documents in the ANNIS Interface

no code implementations LREC 2016 Talvany Carlotto, Zuhaitz Beloki, Xabier Artola, Aitor Soroa

That is often caused by the different linguistic formats used across the applications, which leads to attempts to both establish standard formats to represent linguistic information and to create conversion tools to facilitate this integration.

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

no code implementations ACL 2019 Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

1 code implementation4 Apr 2020 Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune, Eneko Agirre

In the case of textual representations, inference tasks such as Textual Entailment and Semantic Textual Similarity have been often used to benchmark the quality of textual representations.

Benchmarking Image Captioning +4

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque

no code implementations LREC 2020 Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre

Conversational Question Answering (CQA) systems meet user information needs by having conversations with them, where answers to the questions are retrieved from text.

Conversational Question Answering Cross-Lingual Transfer

Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning

1 code implementation COLING 2020 Jon Ander Campos, Kyunghyun Cho, Arantxa Otegi, Aitor Soroa, Gorka Azkune, Eneko Agirre

The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility.

Conversational Question Answering Document Classification

Inferring spatial relations from textual descriptions of images

1 code implementation1 Feb 2021 Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre

Previous work did not use the caption text information, but a manually provided relation holding between the subject and the object.

Common Sense Reasoning Object +1

Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering

1 code implementation15 Sep 2021 Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

Our results on a visual question answering task which requires external knowledge (OK-VQA) show that our text-only model outperforms pretrained multimodal (image-text) models of comparable number of parameters.

Image Captioning Knowledge Graphs +3

Does Corpus Quality Really Matter for Low-Resource Languages?

no code implementations15 Mar 2022 Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.

Representation Learning

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation

1 code implementation24 May 2022 Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre

During inference, we build control codes for the desired meter and rhyme scheme, and condition our language model on them to generate formal verse poetry.

Language Modelling valid

Principled Paraphrase Generation with Parallel Corpora

1 code implementation ACL 2022 Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.

Machine Translation Paraphrase Generation +1

Noisy Channel for Automatic Text Simplification

no code implementations6 Nov 2022 Oscar M Cumbicus-Pineda, Iker Gutiérrez-Fandiño, Itziar Gonzalez-Dios, Aitor Soroa

In this paper we present a simple re-ranking method for Automatic Sentence Simplification based on the noisy channel scheme.

Language Modelling Re-Ranking +2

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

6 code implementations9 Nov 2022 BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Language Modelling Multilingual NLP

Do Multilingual Language Models Think Better in English?

1 code implementation2 Aug 2023 Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe

In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models.

Common Sense Reasoning Cross-Lingual Natural Language Inference +6

XNLIeu: a dataset for cross-lingual NLI in Basque

2 code implementations10 Apr 2024 Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa

We have conducted a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition on the MT system; b) the best cross-lingual strategy for NLI in Basque; and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation.

Natural Language Inference Natural Language Understanding +1

A Syntax-Aware Edit-based System for Text Simplification

no code implementations RANLP 2021 Oscar M. Cumbicus-Pineda, Itziar Gonzalez-Dios, Aitor Soroa

Edit-based text simplification systems have attained much attention in recent years due to their ability to produce simplification solutions that are interpretable, as well as requiring less training examples compared to traditional seq2seq systems.

Sentence Text Simplification

Ontology Population Reusing Resources for Dialogue Intent Detection: Generic and Multilingual Approach

no code implementations RANLP 2021 Cristina Aceta, Izaskun Fernández, Aitor Soroa

This work presents a generic semi-automatic strategy to populate the domain ontology of an ontology-driven task-oriented dialogue system, with the aim of performing successful intent detection in the dialogue process, reusing already existing multilingual resources.

Intent Detection Management

BasqueGLUE: A Natural Language Understanding Benchmark for Basque

1 code implementation LREC 2022 Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Natural Language Understanding (NLU) technology has improved significantly over the last few years and multitask benchmarks such as GLUE are key to evaluate this improvement in a robust and general way.

Natural Language Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.