Sequence-To-Sequence Speech Recognition
7 papers with code • 0 benchmarks • 0 datasets
These leaderboards are used to track progress in Sequence-To-Sequence Speech Recognition
We also investigate model complementarity: we find that we can improve WERs by up to 9% relative by rescoring N-best lists generated from a strong word-piece based baseline with either the phoneme or the grapheme model.
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.
We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another.
Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese.
Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system.
On How2 English-Portuguese speech translation, we reduce latency to 0. 7 second (-84% rel.)
To alleviate this problem we supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.