Browse > Speech > Speech Recognition

Speech Recognition

188 papers with code · Speech

Speech recognition is the task of recognising speech within audio and converting it into text.

State-of-the-art leaderboards

Greatest papers with code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 Dec 2015tensorflow/models

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

 SOTA for Speech Recognition on WSJ eval93 (using extra training data)

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION NOISY SPEECH RECOGNITION

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 Apr 2019mozilla/DeepSpeech

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

 SOTA for Speech Recognition on LibriSpeech test-clean (using extra training data)

DATA AUGMENTATION END-TO-END SPEECH RECOGNITION LANGUAGE MODELLING SPEECH RECOGNITION

Deep Speech: Scaling up end-to-end speech recognition

17 Dec 2014mozilla/DeepSpeech

We present a state-of-the-art speech recognition system developed using end-to-end deep learning.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION

End-to-end speech recognition using lattice-free MMI

Interspeech 2018 2018 kaldi-asr/kaldi

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models.

END-TO-END SPEECH RECOGNITION SPEECH RECOGNITION

Neural Network Language Modeling with Letter-based Features and Importance Sampling

ICASSP 2018 kaldi-asr/kaldi

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

LANGUAGE MODELLING SPEECH RECOGNITION

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

27 Mar 2018kaldi-asr/kaldi

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit.

DISTANT SPEECH RECOGNITION LANGUAGE MODELLING NOISY SPEECH RECOGNITION SPEECH ENHANCEMENT

Purely sequence-trained neural networks for ASR based on lattice-free MMI

INTERSPEECH 2016 2016 kaldi-asr/kaldi

Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.

LANGUAGE MODELLING LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SPEECH RECOGNITION

wav2vec: Unsupervised Pre-training for Speech Recognition

11 Apr 2019pytorch/fairseq

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

SPEECH RECOGNITION

Who Needs Words? Lexicon-Free Speech Recognition

9 Apr 2019facebookresearch/wav2letter

Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words.

SPEECH RECOGNITION

wav2letter++: The Fastest Open-source Speech Recognition System

18 Dec 2018facebookresearch/wav2letter

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.

SPEECH RECOGNITION