Search Results for author: Piotr Żelasko

Found 27 papers, 8 papers with code

Regularizing Contrastive Predictive Coding for Speech Applications

no code implementations • 12 Apr 2023 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition.

Acoustic Unit Discovery Automatic Speech Recognition +3

Paper
Add Code

Delay-penalized transducer for low-latency streaming ASR

1 code implementation • 31 Oct 2022 • Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Żelasko, Daniel Povey

In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

1,038

Paper
Code

Fast and parallel decoding for transducer

1 code implementation • 31 Oct 2022 • Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey

In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches.

speech-recognition Speech Recognition

771

Paper
Code

Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

no code implementations • 4 Sep 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak

We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.

Bandwidth Extension Data Augmentation +3

Paper
Add Code

Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations

1 code implementation • 10 Aug 2022 • Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak

This paper applies a non-contrastive self-supervised learning method on an unlabeled speech corpus to learn utterance-level embeddings.

Emotion Recognition Self-Supervised Learning +1

Paper
Code

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

no code implementations • 5 Oct 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e. g., phone level.

Boundary Detection Representation Learning +1

Paper
Add Code

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Agnieszka Mikołajczyk, Piotr Pęzik, Najim Dehak

Further, we show that by training the model in the written text domain and then transfer learning to conversations, we can achieve reasonable performance with less data.

Transfer Learning

Paper
Add Code

Beyond Isolated Utterances: Conversational Emotion Recognition

no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak

While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations.

Speech Emotion Recognition

Paper
Add Code

Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems

no code implementations • 9 Jul 2021 • Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak

Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition.

Representation Learning Speaker Identification +4

Paper
Add Code

What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition

1 code implementation • 5 Jul 2021 • Piotr Żelasko, Raghavendra Pappagari, Najim Dehak

Dialog acts can be interpreted as the atomic units of a conversation, more fine-grained than utterances, characterized by a specific communicative function.

Segmentation Specificity +1

Paper
Code

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

no code implementations • 3 Jun 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.

Paper
Add Code

Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification

no code implementations • 3 Apr 2021 • Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

We investigate it for adapt microphone speech to the telephone domain.

Domain Adaptation Speaker Verification +1

Paper
Add Code

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

1 code implementation • 2 Apr 2021 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Odette Scharenborg

In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent.

Acoustic Unit Discovery Clustering

Paper
Code

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations • 31 Mar 2021 • Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition +2

Paper
Add Code

Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems

no code implementations • 22 Jan 2021 • Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks.

Speaker Recognition

Paper
Add Code

Focus on the present: a regularization method for the ASR source-target attention layer

no code implementations • 2 Nov 2020 • Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.

speech-recognition Speech Recognition

Paper
Add Code

CopyPaste: An Augmentation Method for Speech Emotion Recognition

no code implementations • 27 Oct 2020 • Raghavendra Pappagari, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Data augmentation is a widely used strategy for training robust machine learning models.

Data Augmentation Speaker Recognition +2

Paper
Add Code

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

1 code implementation • 22 Oct 2020 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

WER we are and WER we think we are

no code implementations • Findings of the Association for Computational Linguistics 2020 • Piotr Szymański, Piotr Żelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Żyła-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski, Yishay Carmiel

Natural language processing of conversational speech requires the availability of high-quality transcripts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

no code implementations • 26 Jul 2020 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak

We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.

Segmentation

Paper
Add Code

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

no code implementations • 16 May 2020 • Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Hierarchical Transformers for Long Document Classification

3 code implementations • 23 Oct 2019 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Classification Document Classification +3

Paper
Code

Avaya Conversational Intelligence: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations

no code implementations • 2 Sep 2019 • Jan Mizgajski, Adrian Szymczak, Robert Głowski, Piotr Szymański, Piotr Żelasko, Łukasz Augustyniak, Mikołaj Morzy, Yishay Carmiel, Jeff Hodson, Łukasz Wójciak, Daniel Smoczyk, Adam Wróbel, Bartosz Borowik, Adam Artajew, Marcin Baran, Cezary Kwiatkowski, Marzena Żyła-Hoppe

Avaya Conversational Intelligence(ACI) is an end-to-end, cloud-based solution for real-time Spoken Language Understanding for call centers.

Abstractive Text Summarization Intent Recognition +4

Paper
Add Code

Towards Better Understanding of Spontaneous Conversations: Overcoming Automatic Speech Recognition Errors With Intent Recognition

no code implementations • 21 Aug 2019 • Piotr Żelasko, Jan Mizgajski, Mikołaj Morzy, Adrian Szymczak, Piotr Szymański, Łukasz Augustyniak, Yishay Carmiel

In this paper, we present a method for correcting automatic speech recognition (ASR) errors using a finite state transducer (FST) intent recognition framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Punctuation Prediction Model for Conversational Speech

no code implementations • 2 Jul 2018 • Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, Najim Dehak

The models are trained on the Fisher corpus which includes punctuation annotation.

Paper
Add Code

Expanding Abbreviations in a Strongly Inflected Language: Are Morphosyntactic Tags Sufficient?

no code implementations • 20 Aug 2017 • Piotr Żelasko

In this paper, the problem of recovery of morphological information lost in abbreviated forms is addressed with a focus on highly inflected languages.

TAG

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.