Browse > Speech > Speech Recognition

Speech Recognition

220 papers with code · Speech

Speech recognition is the task of recognising speech within audio and converting it into text.

( Image credit: SpecAugment )

Leaderboards

Greatest papers with code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 Dec 2015tensorflow/models

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

 SOTA for Speech Recognition on WSJ eval93 (using extra training data)

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION NOISY SPEECH RECOGNITION

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 Apr 2019mozilla/DeepSpeech

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

#3 best model for Speech Recognition on LibriSpeech test-clean (using extra training data)

DATA AUGMENTATION END-TO-END SPEECH RECOGNITION LANGUAGE MODELLING SPEECH RECOGNITION

Deep Speech: Scaling up end-to-end speech recognition

17 Dec 2014mozilla/DeepSpeech

We present a state-of-the-art speech recognition system developed using end-to-end deep learning.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

12 Oct 2019pytorch/fairseq

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.

SPEECH RECOGNITION

wav2vec: Unsupervised Pre-training for Speech Recognition

11 Apr 2019pytorch/fairseq

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

SPEECH RECOGNITION

Semi-Supervised Speech Recognition via Local Prior Matching

24 Feb 2020facebookresearch/wav2letter

For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability.

LANGUAGE MODELLING SPEECH RECOGNITION

wav2letter++: The Fastest Open-source Speech Recognition System

18 Dec 2018facebookresearch/wav2letter

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.

SPEECH RECOGNITION

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

12 Aug 2014baidu-research/warp-ctc

This approach to decoding enables first-pass speech recognition with a language model, completely unaided by the cumbersome infrastructure of HMM-based systems.

LANGUAGE MODELLING LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SPEECH RECOGNITION

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

25 May 2018snipsco/snips-nlu

This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices.

SPEECH RECOGNITION SPOKEN LANGUAGE UNDERSTANDING

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

24 Oct 2019espnet/espnet

Furthermore, the unified design enables the integration of ASR functions with TTS, e. g., ASR-based objective evaluation and semi-supervised learning with both ASR and TTS models.

SPEECH RECOGNITION