Search Results for author: Piotr Pęzik

Found 5 papers, 0 papers with code

Introducing the CURLICAT Corpora: Seven-language Domain Specific Annotated Corpora from Curated Sources

no code implementations • LREC 2022 • Tamás Váradi, Bence Nyéki, Svetla Koeva, Marko Tadić, Vanja Štefanec, Maciej Ogrodniczuk, Bartłomiej Nitoń, Piotr Pęzik, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Dan Tufiș, Radovan Garabík, Simon Krek, Andraž Repar

This article presents the current outcomes of the CURLICAT CEF Telecom project, which aims to collect and deeply annotate a set of large corpora from selected domains.

NMT

Paper
Add Code

DiaBiz – an Annotated Corpus of Polish Call Center Dialogs

no code implementations • LREC 2022 • Piotr Pęzik, Gosia Krawentek, Sylwia Karasińska, Paweł Wilk, Paulina Rybińska, Anna Cichosz, Angelika Peljak-Łapińska, Mikołaj Deckert, Michał Adamczyk

This paper introduces DiaBiz, a large, annotated, multimodal corpus of Polish telephone conversations conducted in varied business settings, comprising 4036 call centre interactions from nine different domains, i. e. banking, energy services, telecommunications, insurance, medical care, debt collection, tourism, retail and car rental.

speech-recognition Speech Recognition

Paper
Add Code

SpokesBiz -- an Open Corpus of Conversational Polish

no code implementations • 19 Dec 2023 • Piotr Pęzik, Sylwia Karasińska, Anna Cichosz, Łukasz Jałowiecki, Konrad Kaczyński, Małgorzata Krawentek, Karolina Walkusz, Paweł Wilk, Mariusz Kleć, Krzysztof Szklanny, Szymon Marszałkowski

This paper announces the early release of SpokesBiz, a freely available corpus of conversational Polish developed within the CLARIN-BIZ project and comprising over 650 hours of recordings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Keyword Extraction from Short Texts with a Text-To-Text Transfer Transformer

no code implementations • 28 Sep 2022 • Piotr Pęzik, Agnieszka Mikołajczyk-Bareła, Adam Wawrzyński, Bartłomiej Nitoń, Maciej Ogrodniczuk

The paper explores the relevance of the Text-To-Text Transfer Transformer language model (T5) for Polish (plT5) to the task of intrinsic and extrinsic keyword extraction from short text passages.

Keyword Extraction Language Modelling

Paper
Add Code

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Agnieszka Mikołajczyk, Piotr Pęzik, Najim Dehak

Further, we show that by training the model in the written text domain and then transfer learning to conversations, we can achieve reasonable performance with less data.

Transfer Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.