Search Results for author: Rafał Poświata

Found 7 papers, 5 papers with code

A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training

no code implementations10 Jul 2024 Michał Perełkiewicz, Rafał Poświata

This article presents a comprehensive review of the challenges associated with using massive web-mined corpora for the pre-training of large language models (LLMs).

Bias Detection

PL-MTEB: Polish Massive Text Embedding Benchmark

1 code implementation16 May 2024 Rafał Poświata, Sławomir Dadas, Michał Perełkiewicz

In this paper, we introduce the Polish Massive Text Embedding Benchmark (PL-MTEB), a comprehensive benchmark for text embeddings in Polish.

PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods

no code implementations20 Feb 2024 Sławomir Dadas, Michał Perełkiewicz, Rafał Poświata

Our dense models outperform the best solutions available to date, and the use of hybrid methods further improves their performance.

Information Retrieval Knowledge Distillation +1

Pre-training Polish Transformer-based Language Models at Scale

1 code implementation7 Jun 2020 Sławomir Dadas, Michał Perełkiewicz, Rafał Poświata

We then evaluate our models on thirteen Polish linguistic tasks, and demonstrate improvements over previous approaches in eleven of them.

Machine Translation Question Answering +1

Evaluation of Sentence Representations in Polish

1 code implementation25 Oct 2019 Sławomir Dadas, Michał Perełkiewicz, Rafał Poświata

Methods for learning sentence representations have been actively developed in recent years.

Sentence Sentence Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.