Search Results for author: Piotr Rybak

Found 10 papers, 2 papers with code

NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems

no code implementations7 Mar 2024 Martyna Wiącek, Piotr Rybak, Łukasz Pszenny, Alina Wróblewska

Aware of the shortcomings of existing NLPre evaluation approaches, we investigate a novel method of reliable and fair evaluation and performance reporting.

Benchmarking Dependency Parsing +2

Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching

no code implementations22 Feb 2024 Piotr Rybak

Pre-trained language models have revolutionized the natural language understanding landscape, most notably BERT (Bidirectional Encoder Representations from Transformers).

Natural Language Understanding

MAUPQA: Massive Automatically-created Polish Question Answering Dataset

no code implementations9 May 2023 Piotr Rybak

Recently, open-domain question answering systems have begun to rely heavily on annotated datasets to train neural passage retrievers.

Open-Domain Question Answering Passage Retrieval +1

Going beyond research datasets: Novel intent discovery in the industry setting

no code implementations9 May 2023 Aleksandra Chrabrowa, Tsimur Hadeliya, Dariusz Kajtoch, Robert Mroczkowski, Piotr Rybak

We also devise the best method to utilize the conversational structure (i. e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.

Clustering Intent Discovery

PolQA: Polish Question Answering Dataset

no code implementations17 Dec 2022 Piotr Rybak, Piotr Przybyła, Maciej Ogrodniczuk

Recently proposed systems for open-domain question answering (OpenQA) require large amounts of training data to achieve state-of-the-art performance.

Open-Domain Question Answering Passage Retrieval +1

HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish

no code implementations EACL (BSNLP) 2021 Robert Mroczkowski, Piotr Rybak, Alina Wróblewska, Ireneusz Gawlik

Therefore, this paper presents the first ablation study focused on Polish, which, unlike the isolating English language, is a fusional language.

Language Modelling

KLEJ: Comprehensive Benchmark for Polish Language Understanding

1 code implementation ACL 2020 Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik

To ensure a common evaluation scheme and promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and applications.

named-entity-recognition Named Entity Recognition +5

Semi-Supervised Neural System for Tagging, Parsing and Lematization

1 code implementation CONLL 2018 Piotr Rybak, Alina Wróblewska

This paper describes the ICS PAS system which took part in CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies.

Cannot find the paper you are looking for? You can Submit a new open access paper.