no code implementations • 15 Apr 2024 • Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka
This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs.
no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow
Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.
no code implementations • 10 May 2022 • Ilya Sklyar, Anna Piunova, Christian Osendorfer
Finally, we establish a novel framework for segmentation analysis of multi-party conversations through emission latency metrics.
no code implementations • 19 Dec 2021 • Ilya Sklyar, Anna Piunova, Xianrui Zheng, YuLan Liu
Second, we propose a novel multi-turn RNN-T (MT-RNN-T) model with an overlap-based target arrangement strategy that generalizes to an arbitrary number of speakers without changes in the model architecture.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 Nov 2020 • Ilya Sklyar, Anna Piunova, YuLan Liu
Recent research shows end-to-end ASR systems can recognize overlapped speech from multiple speakers.
no code implementations • 20 Nov 2020 • Andreas Schwarz, Ilya Sklyar, Simon Wiesler
We present a training scheme for streaming automatic speech recognition (ASR) based on recurrent neural network transducers (RNN-T) which allows the encoder network to learn to exploit context audio from a stream, using segmented or partially labeled sequences of the stream during training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 9 May 2019 • Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney
In a more realistic ASR scenario the audio signal contains significant portions of single-speaker speech and only part of the signal contains speech of multiple competing speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3