Search Results for author: Vitaly Lavrukhin

Found 27 papers, 10 papers with code

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer

no code implementations10 Jan 2025 Vladimir Bataev, Subhankar Ghosh, Vitaly Lavrukhin, Jason Li

The proposed system first uses a transducer architecture to learn monotonic alignments between tokenized text and speech codec tokens for the first codebook.

speech-recognition Speech Recognition +2

Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages

no code implementations8 Jan 2025 Alexan Ayrapetyan, Sofia Kostandian, Ara Yeroyan, Mher Yerznkanyan, Nikolay Karpov, Nune Tadevosyan, Vitaly Lavrukhin, Boris Ginsburg

This study explores methods to increase data volume for low-resource languages using techniques such as crowdsourcing, pseudo-labeling, advanced data preprocessing and various permissive data sources such as audiobooks, Common Voice, YouTube.

speech-recognition Speech Recognition

EMMeTT: Efficient Multimodal Machine Translation Training

no code implementations20 Sep 2024 Piotr Żelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg

This work focuses on neural machine translation (NMT) and proposes a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST).

automatic-speech-translation Decoder +3

Chain-of-Thought Prompting for Speech Translation

no code implementations17 Sep 2024 Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Żelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in Speech-LLM models that exhibit strong performance in automatic speech recognition (ASR) and automatic speech translation (AST).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Label-Looping: Highly Efficient Decoding for Transducers

1 code implementation10 Jun 2024 Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg

This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models.

speech-recognition Speech Recognition

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models

no code implementations4 Oct 2023 Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Chat About Boring Problems: Studying GPT-based text normalization

no code implementations23 Sep 2023 Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg

Through this new framework, we can identify strengths and weaknesses of GPT-based TN, opening opportunities for future work.

Prompt Engineering

Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio

2 code implementations9 Aug 2023 Yang Zhang, Krishna C. Puvvada, Vitaly Lavrukhin, Boris Ginsburg

We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR).

Automatic Speech Recognition speech-recognition +1

Confidence-based Ensembles of End-to-End Speech Recognition Models

no code implementations27 Jun 2023 Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg

Second, we demonstrate that it is possible to combine base and adapted models to achieve strong results on both original and target data.

Language Identification Model Selection +2

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

1 code implementation27 Feb 2023 Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg

We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Toolbox for Construction and Analysis of Speech Datasets

1 code implementation11 Apr 2021 Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

1 code implementation5 Apr 2021 Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.

speech-recognition Speech Recognition

Hi-Fi Multi-Speaker English TTS Dataset

no code implementations3 Apr 2021 Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang

This paper introduces a new multi-speaker English dataset for training text-to-speech models.

Text to Speech

Jasper: An End-to-End Convolutional Neural Acoustic Model

10 code implementations5 Apr 2019 Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.

Decoder Language Modeling +2

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

no code implementations2 Nov 2018 Jason Li, Ravi Gadde, Boris Ginsburg, Vitaly Lavrukhin

Building an accurate automatic speech recognition (ASR) system requires a large dataset that contains many hours of labeled speech samples produced by a diverse set of speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.