Search Results for author: Cyril Allauzen

Found 15 papers, 1 papers with code

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

no code implementations • 23 Jan 2024 • W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath

In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck.

Language Modelling Large Language Model +2

Paper
Add Code

Large-scale Language Model Rescoring on Long-form Data

no code implementations • 13 Jun 2023 • Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno, Michael Riley

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR.

Language Modelling speech-recognition +1

Paper
Add Code

Alignment Entropy Regularization

no code implementations • 22 Dec 2022 • Ehsan Variani, Ke wu, David Rybach, Cyril Allauzen, Michael Riley

Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

no code implementations • 28 Nov 2022 • W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman

We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model.

Vocal Bursts Valence Prediction

Paper
Add Code

Global Normalization for Streaming Speech Recognition in a Modular Framework

1 code implementation • 26 May 2022 • Ehsan Variani, Ke wu, Michael Riley, David Rybach, Matt Shannon, Cyril Allauzen

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition.

speech-recognition Speech Recognition

Paper
Code

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

no code implementations • 22 Apr 2022 • W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu

Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition.

Sentence speech-recognition +1

Paper
Add Code

A* shortest string decoding for non-idempotent semirings

no code implementations • 14 Apr 2022 • Kyle Gorman, Cyril Allauzen

We describe an algorithm which finds the shortest string for a weighted non-deterministic automaton over such semirings using the backwards shortest distance of an equivalent deterministic automaton (DFA) as a heuristic for A* search performed over a companion idempotent semiring, which is proven to return the shortest string.

Paper
Add Code

Hybrid Autoregressive Transducer (hat)

no code implementations • 12 Mar 2020 • Ehsan Variani, David Rybach, Cyril Allauzen, Michael Riley

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Federated Learning of N-gram Language Models

no code implementations • CONLL 2019 • Mingqing Chen, Ananda Theertha Suresh, Rajiv Mathews, Adeline Wong, Cyril Allauzen, Françoise Beaufays, Michael Riley

The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of virtual keyboard.

Federated Learning Language Modelling

Paper
Add Code

On the Compression of Lexicon Transducers

no code implementations • WS 2019 • Marco Cognetta, Cyril Allauzen, Michael Riley

Indeed, a delicate balance between comprehensiveness, speed, and memory must be struck to conform to device requirements while providing a good user experience. In this paper, we describe a compression scheme for lexicons when represented as finite-state transducers.