Search Results for author: Ehsan Variani

Found 12 papers, 2 papers with code

LAST: Scalable Lattice-Based Speech Modelling in JAX

1 code implementation25 Apr 2023 Ke wu, Ehsan Variani, Tom Bagby, Michael Riley

We introduce LAST, a LAttice-based Speech Transducer library in JAX.

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition

no code implementations16 Feb 2023 Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran

We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.

Language Modelling speech-recognition +1

Alignment Entropy Regularization

no code implementations22 Dec 2022 Ehsan Variani, Ke wu, David Rybach, Cyril Allauzen, Michael Riley

Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Modular Hybrid Autoregressive Transducer

no code implementations31 Oct 2022 Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder.

Language Modelling speech-recognition +1

Global Normalization for Streaming Speech Recognition in a Modular Framework

1 code implementation26 May 2022 Ehsan Variani, Ke wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition.

speech-recognition Speech Recognition

Improving Rare Word Recognition with LM-aware MWER Training

no code implementations15 Apr 2022 Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach

Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups.

Hybrid Autoregressive Transducer (hat)

no code implementations12 Mar 2020 Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

no code implementations26 Feb 2020 Erik McDermott, Hasim Sak, Ehsan Variani

The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

WEST: Word Encoded Sequence Transducers

no code implementations20 Nov 2018 Ehsan Variani, Ananda Theertha Suresh, Mitchel Weintraub

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.