Search Results for author: Ilya Sklyar

Found 7 papers, 0 papers with code

Anatomy of Industrial Scale Multilingual ASR

no code implementations15 Apr 2024 Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs.

Anatomy Automatic Speech Recognition +3

Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

no code implementations10 May 2022 Ilya Sklyar, Anna Piunova, Christian Osendorfer

Finally, we establish a novel framework for segmentation analysis of multi-party conversations through emission latency metrics.

Segmentation speech-recognition +3

Multi-turn RNN-T for streaming recognition of multi-party speech

no code implementations19 Dec 2021 Ilya Sklyar, Anna Piunova, Xianrui Zheng, YuLan Liu

Second, we propose a novel multi-turn RNN-T (MT-RNN-T) model with an overlap-based target arrangement strategy that generalizes to an arbitrary number of speakers without changes in the model architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Streaming Multi-speaker ASR with RNN-T

no code implementations23 Nov 2020 Ilya Sklyar, Anna Piunova, YuLan Liu

Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers.

speech-recognition Speech Recognition

Improving RNN-T ASR Accuracy Using Context Audio

no code implementations20 Nov 2020 Andreas Schwarz, Ilya Sklyar, Simon Wiesler

We present a training scheme for streaming automatic speech recognition (ASR) based on recurrent neural network transducers (RNN-T) which allows the encoder network to learn to exploit context audio from a stream, using segmented or partially labeled sequences of the stream during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

no code implementations9 May 2019 Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.