Search Results for author: Roberto Lotufo

Found 34 papers, 27 papers with code

Measuring Cross-lingual Transfer in Bytes

2 code implementations • 12 Apr 2024 • Leandro Rodrigues de Souza, Thales Sales Almeida, Roberto Lotufo, Rodrigo Nogueira

We also found evidence that this transfer is not related to language contamination or language proximity, which strengthens the hypothesis that the model also relies on language-agnostic knowledge.

Cross-Lingual Transfer

144

Paper
Code

Lissard: Long and Simple Sequential Reasoning Datasets

no code implementations • 12 Feb 2024 • Mirelle Bueno, Roberto Lotufo, Rodrigo Nogueira

Language models are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens.

Paper
Add Code

ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs

1 code implementation • 9 Feb 2024 • Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira

ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels.

Data Augmentation Information Retrieval +1

Paper
Code

InRanker: Distilled Rankers for Zero-shot Information Retrieval

no code implementations • 12 Jan 2024 • Thiago Laitz, Konstantinos Papakostas, Roberto Lotufo, Rodrigo Nogueira

Despite multi-billion parameter neural rankers being common components of state-of-the-art information retrieval pipelines, they are rarely used in production due to the enormous amount of compute required for inference.

Information Retrieval Language Modelling +2

Paper
Add Code

INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges

no code implementations • 10 Jan 2024 • Jayr Pereira, Andre Assumpcao, Julio Trecenti, Luiz Airosa, Caio Lente, Jhonatan Cléto, Guilherme Dobins, Rodrigo Nogueira, Luis Mitchell, Roberto Lotufo

This paper introduces INACIA (Instru\c{c}\~ao Assistida com Intelig\^encia Artificial), a groundbreaking system designed to integrate Large Language Models (LLMs) into the operational framework of Brazilian Federal Court of Accounts (TCU).

Decision Making Fairness

Paper
Add Code

InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval

1 code implementation • 10 Jul 2023 • Hugo Abonizio, Luiz Bonifacio, Vitor Jeronymo, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Our toolkit not only reproduces the InPars method and partially reproduces Promptagator, but also provides a plug-and-play functionality allowing the use of different LLMs, exploring filtering methods and finetuning various reranker models on the generated data.

Information Retrieval Retrieval +1

154

Paper
Code

Automated computed tomography and magnetic resonance imaging segmentation using deep learning: a beginner's guide

1 code implementation • 12 Apr 2023 • Diedre Carmo, Gustavo Pinheiro, Lívia Rodrigues, Thays Abreu, Roberto Lotufo, Letícia Rittner

Medical image segmentation is an increasingly popular area of research in medical imaging processing and analysis.

Image Segmentation Medical Image Segmentation +2

Paper
Code

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

1 code implementation • 29 Mar 2023 • Desnes Nunes, Ricardo Primi, Ramon Pires, Roberto Lotufo, Rodrigo Nogueira

The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities.

Multiple-choice

Paper
Code

NeuralMind-UNICAMP at 2022 TREC NeuCLIR: Large Boring Rerankers for Cross-lingual Retrieval

1 code implementation • 28 Mar 2023 • Vitor Jeronymo, Roberto Lotufo, Rodrigo Nogueira

This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022.

Cross-Lingual Information Retrieval Retrieval

Paper
Code

ExaRanker: Explanation-Augmented Neural Ranker

1 code implementation • 25 Jan 2023 • Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira

Recent work has shown that inducing a large language model (LLM) to generate explanations prior to outputting an answer is an effective strategy to improve performance on a wide range of reasoning tasks.

Language Modelling Large Language Model +1

Paper
Code

InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval

1 code implementation • 4 Jan 2023 • Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira

Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents.

Information Retrieval Retrieval

154

Paper
Code

Visconde: Multi-document QA with GPT-3 and Neural Reranking

1 code implementation • 19 Dec 2022 • Jayr Pereira, Robson Fidalgo, Roberto Lotufo, Rodrigo Nogueira

This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.

Language Modelling Large Language Model +2

Paper
Code

In Defense of Cross-Encoders for Zero-Shot Retrieval

1 code implementation • 12 Dec 2022 • Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.

Retrieval

Paper
Code

NeuralSearchX: Serving a Multi-billion-parameter Reranker for Multilingual Metasearch at a Low Cost

no code implementations • 26 Oct 2022 • Thales Sales Almeida, Thiago Laitz, João Seródio, Luiz Henrique Bonifacio, Roberto Lotufo, Rodrigo Nogueira

We compare our system with Microsoft's Biomedical Search and show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.

Retrieval

Paper
Add Code

Open-source tool for Airway Segmentation in Computed Tomography using 2.5D Modified EfficientDet: Contribution to the ATM22 Challenge

1 code implementation • 29 Sep 2022 • Diedre Carmo, Leticia Rittner, Roberto Lotufo

Airway segmentation in computed tomography images can be used to analyze pulmonary diseases, however, manual segmentation is labor intensive and relies on expert knowledge.

Segmentation

Paper
Code

mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark

no code implementations • 27 Sep 2022 • Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, Rodrigo Nogueira

Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset.

Information Retrieval Retrieval

Paper
Add Code

MonoByte: A Pool of Monolingual Byte-level Language Models

1 code implementation • COLING 2022 • Hugo Abonizio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

The zero-shot cross-lingual ability of models pretrained on multilingual and even monolingual corpora has spurred many hypotheses to explain this intriguing empirical result.

Paper
Code

Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

1 code implementation • 24 Aug 2022 • Mirelle Bueno, Carlos Gemmell, Jeffrey Dalton, Roberto Lotufo, Rodrigo Nogueira

Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation.

Language Modelling

Paper
Code

A Boring-yet-effective Approach for the Product Ranking Task of the Amazon KDD Cup 2022

no code implementations • 9 Aug 2022 • Vitor Jeronymo, Guilherme Rosa, Surya Kallumadi, Roberto Lotufo, Rodrigo Nogueira

In this work we describe our submission to the product ranking task of the Amazon KDD Cup 2022.

Retrieval

Paper
Add Code

No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval

1 code implementation • 6 Jun 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

This has made distilled and dense models, due to latency constraints, the go-to choice for deployment in real-world retrieval applications.

Ranked #1 on Citation Prediction on SciDocs (BEIR)

Argument Retrieval Biomedical Information Retrieval +9

Paper
Code

Billions of Parameters Are Worth More Than In-domain Training Data: A case study in the Legal Case Entailment Task

1 code implementation • 30 May 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira

Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios.

Language Modelling

Paper
Code

XLSR53 Wav2Vec2 Portuguese by Orlem Santos

1 code implementation • - 2022 • Orlem Santos, Rodrigo Frassetto Nogueira, Roberto Lotufo

Ranked #1 on Speech Recognition on Common Voice Portuguese

Speech Recognition

Paper
Code

On the ability of monolingual models to learn language-agnostic representations

no code implementations • 4 Sep 2021 • Leandro Rodrigues de Souza, Rodrigo Nogueira, Roberto Lotufo

Pretrained multilingual models have become a de facto default approach for zero-shot cross-lingual transfer.

Zero-Shot Cross-Lingual Transfer

Paper
Add Code

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset

1 code implementation • 31 Aug 2021 • Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation.

Information Retrieval Machine Translation +4

128

Paper
Code

A cost-benefit analysis of cross-lingual transfer methods

2 code implementations • 14 May 2021 • Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner.

Cross-Lingual Transfer Translation

128

Paper
Code

Yes, BM25 is a Strong Baseline for Legal Case Retrieval

1 code implementation • 26 Apr 2021 • Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto Lotufo, Rodrigo Nogueira

We describe our single submission to task 1 of COLIEE 2021.

Retrieval

Paper
Code

Can questions summarize a corpus? Using question generation for characterizing COVID-19 research

1 code implementation • 19 Sep 2020 • Gabriela Surita, Rodrigo Nogueira, Roberto Lotufo

What are the latent questions on some textual data?

Question Answering Question Generation +1

Paper
Code

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

3 code implementations • 20 Aug 2020 • Diedre Carmo, Marcos Piau, Israel Campiotti, Rodrigo Nogueira, Roberto Lotufo

In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages.

Paper
Code

Lite Training Strategies for Portuguese-English and English-Portuguese Translation

1 code implementation • WMT (EMNLP) 2020 • Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini

Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models.

Machine Translation Translation

Paper
Code

Electricity Theft Detection with self-attention

1 code implementation • 14 Feb 2020 • Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo

In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China.

Position

Paper
Code

Hippocampus Segmentation on Epilepsy and Alzheimer's Disease Studies with Multiple Convolutional Neural Networks

3 code implementations • 14 Jan 2020 • Diedre Carmo, Bruna Silva, Clarissa Yasuda, Letícia Rittner, Roberto Lotufo

We test this methodology alongside other recent deep learning methods, in two domains: The HarP test set and an in-house epilepsy dataset, containing hippocampus resections, named HCUnicamp.

Hippocampus Segmentation

Paper
Code

Portuguese Named Entity Recognition using BERT-CRF

1 code implementation • 23 Sep 2019 • Fábio Souza, Rodrigo Nogueira, Roberto Lotufo

Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering.

named-entity-recognition Named Entity Recognition +2

767

Paper
Code

Extended 2D Consensus Hippocampus Segmentation

3 code implementations • 12 Feb 2019 • Diedre Carmo, Bruna Silva, Clarissa Yasuda, Letícia Rittner, Roberto Lotufo

Segmentation done by experts is considered to be a gold-standard when evaluating automated methods, buts it is a time consuming and arduos task, requiring specialized personnel.

Hippocampus Segmentation

Paper
Code

Convolutional Neural Networks for Skull-stripping in Brain MR Imaging using Consensus-based Silver standard Masks

1 code implementation • 13 Apr 2018 • Oeslle Lucena, Roberto Souza, Leticia Rittner, Richard Frayne, Roberto Lotufo

Our use of silver standard masks reduced the cost of manual annotation, decreased inter-intra-rater variability, and avoided CNN segmentation super-specialization towards one specific manual annotation guideline that can occur when gold standard masks are used.

Skull Stripping

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.