1 code implementation • 7 Oct 2021 • Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg
This paper describes Mixer-TTS, a non-autoregressive model for mel-spectrogram generation.
1 code implementation • 16 Apr 2021 • Stanislav Beliaev, Boris Ginsburg
We propose TalkNet, a non-autoregressive convolutional neural model for speech synthesis with explicit pitch and duration prediction.
15 code implementations • 22 Oct 2019 • Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang
We propose a new end-to-end neural acoustic model for automatic speech recognition.
Ranked #33 on Speech Recognition on LibriSpeech test-clean
Speech Recognition Audio and Speech Processing
1 code implementation • 14 Sep 2019 • Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen
NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition.
Ranked #1 on Speech Recognition on Common Voice Spanish (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1