Sequence-To-Sequence Speech Recognition

7 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition

30stomercury/Automatic_Speech_Recognition 5 Feb 2019

We also investigate model complementarity: we find that we can improve WERs by up to 9% relative by rescoring N-best lists generated from a strong word-piece based baseline with either the phoneme or the grapheme model.

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

tensorflow/lingvo 21 Feb 2019

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

colaprograms/speechify 24 Mar 2017

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another.

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

gentaiscool/end2end-asr-pytorch 28 Apr 2018

Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese.

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

srvk/how2-dataset 9 Nov 2018

Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system.

Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

dannigt/NMTGMinor.lowLatency 22 May 2020

On How2 English-Portuguese speech translation, we reduce latency to 0. 7 second (-84% rel.)

Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition

thaisonngn/pynn 5 Jul 2021

To alleviate this problem we supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.