Speech Recognition

719 papers with code • 278 benchmarks • 184 datasets

Speech recognition is the task of recognising speech within audio and converting it into text.

( Image credit: SpecAugment )


Use these libraries to find Speech Recognition models and implementations
10 papers
9 papers
See all 12 libraries.

Most implemented papers

Listen, Attend and Spell

Alexander-H-Liu/End-to-end-ASR-Pytorch 5 Aug 2015

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Neural Collaborative Filtering

microsoft/recommenders WWW 2017

When it comes to model the key factor in collaborative filtering -- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items.

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

PaddlePaddle/PaddleSpeech 8 Dec 2015

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

mozilla/DeepSpeech 18 Apr 2019

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

retrocirce/hts-audio-transformer 9 Apr 2018

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.

Deep Speech: Scaling up end-to-end speech recognition

PaddlePaddle/PaddleSpeech 17 Dec 2014

We present a state-of-the-art speech recognition system developed using end-to-end deep learning.

Recurrent Neural Network Regularization

wojzaremba/lstm 8 Sep 2014

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units.

Conformer: Convolution-augmented Transformer for Speech Recognition

PaddlePaddle/PaddleSpeech 16 May 2020

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

pytorch/fairseq NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

Improved training of end-to-end attention models for speech recognition

rwth-i6/returnn 8 May 2018

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.