Search Results for author: Roberto Lotufo

Found 34 papers, 27 papers with code

Measuring Cross-lingual Transfer in Bytes

2 code implementations12 Apr 2024 Leandro Rodrigues de Souza, Thales Sales Almeida, Roberto Lotufo, Rodrigo Nogueira

We also found evidence that this transfer is not related to language contamination or language proximity, which strengthens the hypothesis that the model also relies on language-agnostic knowledge.

Cross-Lingual Transfer

Lissard: Long and Simple Sequential Reasoning Datasets

no code implementations12 Feb 2024 Mirelle Bueno, Roberto Lotufo, Rodrigo Nogueira

Language models are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens.

ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs

1 code implementation9 Feb 2024 Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira

ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels.

Data Augmentation Information Retrieval +1

InRanker: Distilled Rankers for Zero-shot Information Retrieval

no code implementations12 Jan 2024 Thiago Laitz, Konstantinos Papakostas, Roberto Lotufo, Rodrigo Nogueira

Despite multi-billion parameter neural rankers being common components of state-of-the-art information retrieval pipelines, they are rarely used in production due to the enormous amount of compute required for inference.

Information Retrieval Language Modelling +2

INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges

no code implementations10 Jan 2024 Jayr Pereira, Andre Assumpcao, Julio Trecenti, Luiz Airosa, Caio Lente, Jhonatan Cléto, Guilherme Dobins, Rodrigo Nogueira, Luis Mitchell, Roberto Lotufo

This paper introduces INACIA (Instru\c{c}\~ao Assistida com Intelig\^encia Artificial), a groundbreaking system designed to integrate Large Language Models (LLMs) into the operational framework of Brazilian Federal Court of Accounts (TCU).

Decision Making Fairness

InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval

1 code implementation10 Jul 2023 Hugo Abonizio, Luiz Bonifacio, Vitor Jeronymo, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Our toolkit not only reproduces the InPars method and partially reproduces Promptagator, but also provides a plug-and-play functionality allowing the use of different LLMs, exploring filtering methods and finetuning various reranker models on the generated data.

Information Retrieval Retrieval +1

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

1 code implementation29 Mar 2023 Desnes Nunes, Ricardo Primi, Ramon Pires, Roberto Lotufo, Rodrigo Nogueira

The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities.

Multiple-choice

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

1 code implementation28 Mar 2023 Vitor Jeronymo, Roberto Lotufo, Rodrigo Nogueira

This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022.

Cross-Lingual Information Retrieval Retrieval

ExaRanker: Explanation-Augmented Neural Ranker

1 code implementation25 Jan 2023 Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira

Recent work has shown that inducing a large language model (LLM) to generate explanations prior to outputting an answer is an effective strategy to improve performance on a wide range of reasoning tasks.

Language Modelling Large Language Model +1

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval

1 code implementation4 Jan 2023 Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents.

Information Retrieval Retrieval

Visconde: Multi-document QA with GPT-3 and Neural Reranking

1 code implementation19 Dec 2022 Jayr Pereira, Robson Fidalgo, Roberto Lotufo, Rodrigo Nogueira

This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.

Language Modelling Large Language Model +2

In Defense of Cross-Encoders for Zero-Shot Retrieval

1 code implementation12 Dec 2022 Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.

Retrieval

NeuralSearchX: Serving a Multi-billion-parameter Reranker for Multilingual Metasearch at a Low Cost

no code implementations26 Oct 2022 Thales Sales Almeida, Thiago Laitz, João Seródio, Luiz Henrique Bonifacio, Roberto Lotufo, Rodrigo Nogueira

We compare our system with Microsoft's Biomedical Search and show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.

Retrieval

Open-source tool for Airway Segmentation in Computed Tomography using 2.5D Modified EfficientDet: Contribution to the ATM22 Challenge

1 code implementation29 Sep 2022 Diedre Carmo, Leticia Rittner, Roberto Lotufo

Airway segmentation in computed tomography images can be used to analyze pulmonary diseases, however, manual segmentation is labor intensive and relies on expert knowledge.

Segmentation

mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark

no code implementations27 Sep 2022 Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, Rodrigo Nogueira

Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset.

Information Retrieval Retrieval

MonoByte: A Pool of Monolingual Byte-level Language Models

1 code implementation COLING 2022 Hugo Abonizio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

The zero-shot cross-lingual ability of models pretrained on multilingual and even monolingual corpora has spurred many hypotheses to explain this intriguing empirical result.

Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

1 code implementation24 Aug 2022 Mirelle Bueno, Carlos Gemmell, Jeffrey Dalton, Roberto Lotufo, Rodrigo Nogueira

Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation.

Language Modelling

Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

1 code implementation30 May 2022 Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira

Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios.

Language Modelling

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset

1 code implementation31 Aug 2021 Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation.

Information Retrieval Machine Translation +4

A cost-benefit analysis of cross-lingual transfer methods

2 code implementations14 May 2021 Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner.

Cross-Lingual Transfer Translation

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

3 code implementations20 Aug 2020 Diedre Carmo, Marcos Piau, Israel Campiotti, Rodrigo Nogueira, Roberto Lotufo

In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages.

Lite Training Strategies for Portuguese-English and English-Portuguese Translation

1 code implementation WMT (EMNLP) 2020 Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini

Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models.

Machine Translation Translation

Electricity Theft Detection with self-attention

1 code implementation14 Feb 2020 Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo

In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China.

Position

Hippocampus Segmentation on Epilepsy and Alzheimer's Disease Studies with Multiple Convolutional Neural Networks

3 code implementations14 Jan 2020 Diedre Carmo, Bruna Silva, Clarissa Yasuda, Letícia Rittner, Roberto Lotufo

We test this methodology alongside other recent deep learning methods, in two domains: The HarP test set and an in-house epilepsy dataset, containing hippocampus resections, named HCUnicamp.

Hippocampus Segmentation

Portuguese Named Entity Recognition using BERT-CRF

1 code implementation23 Sep 2019 Fábio Souza, Rodrigo Nogueira, Roberto Lotufo

Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering.

named-entity-recognition Named Entity Recognition +2

Extended 2D Consensus Hippocampus Segmentation

3 code implementations12 Feb 2019 Diedre Carmo, Bruna Silva, Clarissa Yasuda, Letícia Rittner, Roberto Lotufo

Segmentation done by experts is considered to be a gold-standard when evaluating automated methods, buts it is a time consuming and arduos task, requiring specialized personnel.

Hippocampus Segmentation

Convolutional Neural Networks for Skull-stripping in Brain MR Imaging using Consensus-based Silver standard Masks

1 code implementation13 Apr 2018 Oeslle Lucena, Roberto Souza, Leticia Rittner, Richard Frayne, Roberto Lotufo

Our use of silver standard masks reduced the cost of manual annotation, decreased inter-intra-rater variability, and avoided CNN segmentation super-specialization towards one specific manual annotation guideline that can occur when gold standard masks are used.

Skull Stripping

Cannot find the paper you are looking for? You can Submit a new open access paper.