Search Results for author: Oleg Rybakov

Found 14 papers, 4 papers with code

USM RNN-T model weights binarization

no code implementations5 Jun 2024 Oleg Rybakov, Dmitriy Serdyuk, Chengjian Zheng

Large-scale universal speech models (USM) are already used in production.

Binarization Quantization

SimulTron: On-Device Simultaneous Speech to Speech Translation

no code implementations4 Jun 2024 Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich

Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages.

Simultaneous Speech-to-Speech Translation Speech-to-Speech Translation +1

2-bit Conformer quantization for automatic speech recognition

no code implementations26 May 2023 Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

With the large-scale training data, we obtain a 2-bit Conformer model with over 40% model size reduction against the 4-bit version at the cost of 17% relative word error rate degradation

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

no code implementations24 May 2023 David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He

With the rapid increase in the size of neural networks, model compression has become an important area of research.

Machine Translation Model Compression +3

STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition

no code implementations2 Feb 2023 Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh

Recent innovations on hardware (e. g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference.

Machine Translation

Streaming Parrotron for on-device speech-to-speech conversion

no code implementations25 Oct 2022 Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal

We present a streaming-based approach to produce an acceptable delay, with minimal loss in speech conversion quality, when compared to a reference state of the art non-streaming approach.

Decoder Quantization +1

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

1 code implementation29 Mar 2022 Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization

no code implementations23 Mar 2022 Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno

We also show that learning a speaker-embedding space can scale further and reduce the amount of personalization training data required per speaker.

Real time spectrogram inversion on mobile phone

1 code implementation1 Mar 2022 Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

We present two methods of real time magnitude spectrogram inversion: streaming Griffin Lim(GL) and streaming MelGAN.

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

4 code implementations7 May 2021 Amirali Abdolrashidi, Lisa Wang, Shivani Agrawal, Jonathan Malmaud, Oleg Rybakov, Chas Leichner, Lukasz Lew

In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves.

Quantization

Streaming keyword spotting on mobile devices

3 code implementations14 May 2020 Oleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirko Visontai, Stella Laurenzo

In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones.

Audio and Speech Processing Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.