no code implementations • 10 Oct 2024 • Leandro Carísio Fernandes, Guilherme Zeferino Rodrigues Dobins, Roberto Lotufo, Jayr Alencar Pereira
This paper introduces PublicHearingBR, a Brazilian Portuguese dataset designed for summarizing long documents.
1 code implementation • 8 Oct 2024 • Mirelle Bueno, Roberto Lotufo, Rodrigo Nogueira
Our evaluation of open-source and proprietary models show a consistent decline in performance across all models and languages as the complexity of the sequence increases.
no code implementations • 29 Aug 2024 • Leandro Carísio Fernandes, Gustavo Bartz Guedes, Thiago Soares Laitz, Thales Sales Almeida, Rodrigo Nogueira, Roberto Lotufo, Jayr Pereira
Document summarization is a task to shorten texts into concise and informative summaries.
no code implementations • 19 Jul 2024 • Jayr Pereira, Andre Assumpcao, Roberto Lotufo
In this paper, we propose \textsc{Check-Eval}, a novel evaluation framework leveraging LLMs to assess the quality of generated text through a checklist-based approach.
no code implementations • 16 Jun 2024 • Marcos Piau, Roberto Lotufo, Rodrigo Nogueira
However, the impact of different pretraining settings on downstream tasks remains underexplored.
1 code implementation • 12 Apr 2024 • Leandro Rodrigues de Souza, Thales Sales Almeida, Roberto Lotufo, Rodrigo Nogueira
We also found evidence that this transfer is not related to language contamination or language proximity, which strengthens the hypothesis that the model also relies on language-agnostic knowledge.
1 code implementation • 12 Feb 2024 • Mirelle Bueno, Roberto Lotufo, Rodrigo Nogueira
Language models are now capable of solving tasks that require dealing with long sequences consisting of hundreds of thousands of tokens.
1 code implementation • 9 Feb 2024 • Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira
ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels.
no code implementations • 12 Jan 2024 • Thiago Laitz, Konstantinos Papakostas, Roberto Lotufo, Rodrigo Nogueira
Despite multi-billion parameter neural rankers being common components of state-of-the-art information retrieval pipelines, they are rarely used in production due to the enormous amount of compute required for inference.
no code implementations • 10 Jan 2024 • Jayr Pereira, Andre Assumpcao, Julio Trecenti, Luiz Airosa, Caio Lente, Jhonatan Cléto, Guilherme Dobins, Rodrigo Nogueira, Luis Mitchell, Roberto Lotufo
This paper introduces INACIA (Instru\c{c}\~ao Assistida com Intelig\^encia Artificial), a groundbreaking system designed to integrate Large Language Models (LLMs) into the operational framework of Brazilian Federal Court of Accounts (TCU).
1 code implementation • 10 Jul 2023 • Hugo Abonizio, Luiz Bonifacio, Vitor Jeronymo, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira
Our toolkit not only reproduces the InPars method and partially reproduces Promptagator, but also provides a plug-and-play functionality allowing the use of different LLMs, exploring filtering methods and finetuning various reranker models on the generated data.
1 code implementation • 12 Apr 2023 • Diedre Carmo, Gustavo Pinheiro, Lívia Rodrigues, Thays Abreu, Roberto Lotufo, Letícia Rittner
Medical image segmentation is an increasingly popular area of research in medical imaging processing and analysis.
1 code implementation • 29 Mar 2023 • Desnes Nunes, Ricardo Primi, Ramon Pires, Roberto Lotufo, Rodrigo Nogueira
The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities.
1 code implementation • 28 Mar 2023 • Vitor Jeronymo, Roberto Lotufo, Rodrigo Nogueira
This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022.
1 code implementation • 25 Jan 2023 • Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira
Recent work has shown that inducing a large language model (LLM) to generate explanations prior to outputting an answer is an effective strategy to improve performance on a wide range of reasoning tasks.
1 code implementation • 4 Jan 2023 • Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira
Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents.
1 code implementation • 19 Dec 2022 • Jayr Pereira, Robson Fidalgo, Roberto Lotufo, Rodrigo Nogueira
This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.
1 code implementation • 12 Dec 2022 • Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.
no code implementations • 26 Oct 2022 • Thales Sales Almeida, Thiago Laitz, João Seródio, Luiz Henrique Bonifacio, Roberto Lotufo, Rodrigo Nogueira
We compare our system with Microsoft's Biomedical Search and show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.
1 code implementation • 29 Sep 2022 • Diedre Carmo, Leticia Rittner, Roberto Lotufo
Airway segmentation in computed tomography images can be used to analyze pulmonary diseases, however, manual segmentation is labor intensive and relies on expert knowledge.
no code implementations • 27 Sep 2022 • Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, Rodrigo Nogueira
Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset.
1 code implementation • COLING 2022 • Hugo Abonizio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira
The zero-shot cross-lingual ability of models pretrained on multilingual and even monolingual corpora has spurred many hypotheses to explain this intriguing empirical result.
1 code implementation • 24 Aug 2022 • Mirelle Bueno, Carlos Gemmell, Jeffrey Dalton, Roberto Lotufo, Rodrigo Nogueira
Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation.
no code implementations • 9 Aug 2022 • Vitor Jeronymo, Guilherme Rosa, Surya Kallumadi, Roberto Lotufo, Rodrigo Nogueira
In this work we describe our submission to the product ranking task of the Amazon KDD Cup 2022.
1 code implementation • 6 Jun 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
This has made distilled and dense models, due to latency constraints, the go-to choice for deployment in real-world retrieval applications.
Ranked #1 on
Citation Prediction
on SciDocs (BEIR)
1 code implementation • 30 May 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira
Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios.
no code implementations • 4 Sep 2021 • Leandro Rodrigues de Souza, Rodrigo Nogueira, Roberto Lotufo
Pretrained multilingual models have become a de facto default approach for zero-shot cross-lingual transfer.
1 code implementation • 31 Aug 2021 • Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation.
2 code implementations • 14 May 2021 • Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira
An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner.
1 code implementation • 26 Apr 2021 • Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto Lotufo, Rodrigo Nogueira
We describe our single submission to task 1 of COLIEE 2021.
1 code implementation • 19 Sep 2020 • Gabriela Surita, Rodrigo Nogueira, Roberto Lotufo
What are the latent questions on some textual data?
1 code implementation • WMT (EMNLP) 2020 • Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini
Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models.
3 code implementations • 20 Aug 2020 • Diedre Carmo, Marcos Piau, Israel Campiotti, Rodrigo Nogueira, Roberto Lotufo
In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages.
1 code implementation • 14 Feb 2020 • Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo
In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China.
3 code implementations • 14 Jan 2020 • Diedre Carmo, Bruna Silva, Clarissa Yasuda, Letícia Rittner, Roberto Lotufo
We test this methodology alongside other recent deep learning methods, in two domains: The HarP test set and an in-house epilepsy dataset, containing hippocampus resections, named HCUnicamp.
1 code implementation • 23 Sep 2019 • Fábio Souza, Rodrigo Nogueira, Roberto Lotufo
Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering.
3 code implementations • 12 Feb 2019 • Diedre Carmo, Bruna Silva, Clarissa Yasuda, Letícia Rittner, Roberto Lotufo
Segmentation done by experts is considered to be a gold-standard when evaluating automated methods, buts it is a time consuming and arduos task, requiring specialized personnel.
1 code implementation • 13 Apr 2018 • Oeslle Lucena, Roberto Souza, Leticia Rittner, Richard Frayne, Roberto Lotufo
Our use of silver standard masks reduced the cost of manual annotation, decreased inter-intra-rater variability, and avoided CNN segmentation super-specialization towards one specific manual annotation guideline that can occur when gold standard masks are used.