Search Results for author: Hugo Abonizio

Found 10 papers, 8 papers with code

Sabiá-2: A New Generation of Portuguese Large Language Models

no code implementations14 Mar 2024 Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira, Ramon Pires

We introduce Sabi\'a-2, a family of large language models trained on Portuguese texts.

Math

Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams

1 code implementation23 Nov 2023 Ramon Pires, Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira

Recent advancements in language models have showcased human-comparable performance in academic entrance exams.

InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval

1 code implementation10 Jul 2023 Hugo Abonizio, Luiz Bonifacio, Vitor Jeronymo, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Our toolkit not only reproduces the InPars method and partially reproduces Promptagator, but also provides a plug-and-play functionality allowing the use of different LLMs, exploring filtering methods and finetuning various reranker models on the generated data.

Information Retrieval Retrieval +1

Sabiá: Portuguese Large Language Models

no code implementations16 Apr 2023 Ramon Pires, Hugo Abonizio, Thales Sales Almeida, Rodrigo Nogueira

By evaluating on datasets originally conceived in the target language as well as translated ones, we study the contributions of language-specific pretraining in terms of 1) capturing linguistic nuances and structures inherent to the target language, and 2) enriching the model's knowledge about a domain or culture.

Cultural Vocal Bursts Intensity Prediction

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval

1 code implementation4 Jan 2023 Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents.

Information Retrieval Retrieval

In Defense of Cross-Encoders for Zero-Shot Retrieval

1 code implementation12 Dec 2022 Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.

Retrieval

MonoByte: A Pool of Monolingual Byte-level Language Models

1 code implementation COLING 2022 Hugo Abonizio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

The zero-shot cross-lingual ability of models pretrained on multilingual and even monolingual corpora has spurred many hypotheses to explain this intriguing empirical result.

Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

1 code implementation30 May 2022 Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira

Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios.

Language Modelling

InPars: Data Augmentation for Information Retrieval using Large Language Models

1 code implementation10 Feb 2022 Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Rodrigo Nogueira

In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for IR tasks.

Data Augmentation Information Retrieval +2

Cannot find the paper you are looking for? You can Submit a new open access paper.