Search Results for author: Thales Sales Almeida

Found 10 papers, 3 papers with code

TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models

no code implementations13 Jan 2025 Thales Sales Almeida, Giovana Kerche Bonás, João Guilherme Alves Santos, Hugo Abonizio, Rodrigo Nogueira

In a rapidly evolving knowledge landscape and the increasing adoption of large language models, a need has emerged to keep these models continuously updated with current events.

Continual Learning

Sabiá-3 Technical Report

no code implementations15 Oct 2024 Hugo Abonizio, Thales Sales Almeida, Thiago Laitz, Roseval Malaquias Junior, Giovana Kerche Bonás, Rodrigo Nogueira, Ramon Pires

This report presents Sabi\'a-3, our new flagship language model, and Sabiazinho-3, a more cost-effective sibling.

Language Modeling Language Modelling

Measuring Cross-lingual Transfer in Bytes

1 code implementation12 Apr 2024 Leandro Rodrigues de Souza, Thales Sales Almeida, Roberto Lotufo, Rodrigo Nogueira

We also found evidence that this transfer is not related to language contamination or language proximity, which strengthens the hypothesis that the model also relies on language-agnostic knowledge.

Cross-Lingual Transfer

Sabiá-2: A New Generation of Portuguese Large Language Models

no code implementations14 Mar 2024 Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira, Ramon Pires

We introduce Sabi\'a-2, a family of large language models trained on Portuguese texts.

Math

Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams

1 code implementation23 Nov 2023 Ramon Pires, Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira

Recent advancements in language models have showcased human-comparable performance in academic entrance exams.

BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams

1 code implementation11 Jul 2023 Thales Sales Almeida, Thiago Laitz, Giovana K. Bonás, Rodrigo Nogueira

One common trend in recent studies of language models (LMs) is the use of standardized tests for evaluation.

Natural Language Understanding

Sabiá: Portuguese Large Language Models

no code implementations16 Apr 2023 Ramon Pires, Hugo Abonizio, Thales Sales Almeida, Rodrigo Nogueira

By evaluating on datasets originally conceived in the target language as well as translated ones, we study the contributions of language-specific pretraining in terms of 1) capturing linguistic nuances and structures inherent to the target language, and 2) enriching the model's knowledge about a domain or culture.

Cultural Vocal Bursts Intensity Prediction

NeuralSearchX: Serving a Multi-billion-parameter Reranker for Multilingual Metasearch at a Low Cost

no code implementations26 Oct 2022 Thales Sales Almeida, Thiago Laitz, João Seródio, Luiz Henrique Bonifacio, Roberto Lotufo, Rodrigo Nogueira

We compare our system with Microsoft's Biomedical Search and show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.