Search Results for author: Oleksii Kuchaiev

Found 22 papers, 12 papers with code

NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022

no code implementations IWSLT (ACL) 2022 Oleksii Hrinchuk, Vahid Noroozi, Ashwinkumar Ganesan, Sarah Campbell, Sandeep Subramanian, Somshubra Majumdar, Oleksii Kuchaiev

Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

1 code implementation16 Nov 2023 Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev

To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various aspects that make responses helpful.

Attribute

Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying

no code implementations16 Nov 2023 Adithya Renduchintala, Tugrul Konuk, Oleksii Kuchaiev

We propose Tied-LoRA, a simple paradigm utilizes weight tying and selective training to further increase parameter efficiency of the Low-rank adaptation (LoRA) method.

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

1 code implementation9 Oct 2023 Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev

Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values.

Attribute

Leveraging Synthetic Targets for Machine Translation

no code implementations7 May 2023 Sarthak Mittal, Oleksii Hrinchuk, Oleksii Kuchaiev

In this work, we provide a recipe for training machine translation models in a limited resource setting by leveraging synthetic target data generated using a large pre-trained model.

Machine Translation Translation

NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21

no code implementations16 Nov 2021 Sandeep Subramanian, Oleksii Hrinchuk, Virginia Adams, Oleksii Kuchaiev

This paper provides an overview of NVIDIA NeMo's neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks.

Data Augmentation Knowledge Distillation +3

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

1 code implementation5 Apr 2021 Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.

speech-recognition Speech Recognition

Jasper: An End-to-End Convolutional Neural Acoustic Model

10 code implementations5 Apr 2019 Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.

Language Modelling Speech Recognition

Training Deep AutoEncoders for Recommender Systems

no code implementations ICLR 2018 Oleksii Kuchaiev, Boris Ginsburg

Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training.

Recommendation Systems

Training Deep AutoEncoders for Collaborative Filtering

10 code implementations5 Aug 2017 Oleksii Kuchaiev, Boris Ginsburg

Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training.

Collaborative Filtering Recommendation Systems

Factorization tricks for LSTM networks

2 code implementations31 Mar 2017 Oleksii Kuchaiev, Boris Ginsburg

We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is "matrix factorization by design" of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.