Search Results for author: Ilya Sklyar

Found 7 papers, 0 papers with code

Anatomy of Industrial Scale Multilingual ASR

no code implementations • 15 Apr 2024 • Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs.

Anatomy Automatic Speech Recognition +3

Paper
Add Code

Two-pass Endpoint Detection for Speech Recognition

no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.

speech-recognition Speech Recognition

Paper
Add Code

Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

no code implementations • 10 May 2022 • Ilya Sklyar, Anna Piunova, Christian Osendorfer

Finally, we establish a novel framework for segmentation analysis of multi-party conversations through emission latency metrics.

Segmentation speech-recognition +3

Paper
Add Code

Multi-turn RNN-T for streaming recognition of multi-party speech

no code implementations • 19 Dec 2021 • Ilya Sklyar, Anna Piunova, Xianrui Zheng, YuLan Liu

Second, we propose a novel multi-turn RNN-T (MT-RNN-T) model with an overlap-based target arrangement strategy that generalizes to an arbitrary number of speakers without changes in the model architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming Multi-speaker ASR with RNN-T

no code implementations • 23 Nov 2020 • Ilya Sklyar, Anna Piunova, YuLan Liu

Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers.

speech-recognition Speech Recognition

Paper
Add Code

Improving RNN-T ASR Accuracy Using Context Audio

no code implementations • 20 Nov 2020 • Andreas Schwarz, Ilya Sklyar, Simon Wiesler

We present a training scheme for streaming automatic speech recognition (ASR) based on recurrent neural network transducers (RNN-T) which allows the encoder network to learn to exploit context audio from a stream, using segmented or partially labeled sequences of the stream during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

no code implementations • 9 May 2019 • Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney

In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.