no code implementations • IWSLT (ACL) 2022 • Oleksii Hrinchuk, Vahid Noroozi, Ashwinkumar Ganesan, Sarah Campbell, Sandeep Subramanian, Somshubra Majumdar, Oleksii Kuchaiev
Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • WMT (EMNLP) 2021 • Sandeep Subramanian, Oleksii Hrinchuk, Virginia Adams, Oleksii Kuchaiev
This paper provides an overview of NVIDIA NeMo’s neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks.
no code implementations • 2 Jun 2022 • Virginia Adams, Sandeep Subramanian, Mike Chrzanowski, Oleksii Hrinchuk, Oleksii Kuchaiev
General translation models often still struggle to generate accurate translations in specialized domains.
no code implementations • 16 Nov 2021 • Sandeep Subramanian, Oleksii Hrinchuk, Virginia Adams, Oleksii Kuchaiev
This paper provides an overview of NVIDIA NeMo's neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks.
1 code implementation • 5 Apr 2021 • Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.
Ranked #1 on
Speech Recognition
on SPGISpeech
no code implementations • ICLR 2020 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen
We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.
15 code implementations • 22 Oct 2019 • Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang
We propose a new end-to-end neural acoustic model for automatic speech recognition.
Ranked #29 on
Speech Recognition
on LibriSpeech test-clean
Speech Recognition
Audio and Speech Processing
1 code implementation • 14 Sep 2019 • Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen
NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition.
Ranked #1 on
Speech Recognition
on Common Voice Spanish
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
3 code implementations • 27 May 2019 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen
We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.
8 code implementations • 5 Apr 2019 • Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde
In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.
Ranked #3 on
Speech Recognition
on Hub5'00 SwitchBoard
no code implementations • WS 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Carl Case, Paulius Micikevicius
We present OpenSeq2Seq {--} an open-source toolkit for training sequence-to-sequence models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
4 code implementations • 25 May 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius
We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • ICLR 2018 • Oleksii Kuchaiev, Boris Ginsburg
Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training.
8 code implementations • ICLR 2018 • Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu
Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x.
9 code implementations • 5 Aug 2017 • Oleksii Kuchaiev, Boris Ginsburg
Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training.
2 code implementations • 31 Mar 2017 • Oleksii Kuchaiev, Boris Ginsburg
We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is "matrix factorization by design" of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups.
Ranked #20 on
Language Modelling
on One Billion Word