Speech Recognition

1102 papers with code • 234 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )


Use these libraries to find Speech Recognition models and implementations
16 papers
13 papers
11 papers
See all 16 libraries.

Most implemented papers

Listen, Attend and Spell

Alexander-H-Liu/End-to-end-ASR-Pytorch 5 Aug 2015

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

PaddlePaddle/PaddleSpeech 8 Dec 2015

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

Communication-Efficient Learning of Deep Networks from Decentralized Data

adap/flower 17 Feb 2016

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device.

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

retrocirce/hts-audio-transformer 9 Apr 2018

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

mozilla/DeepSpeech 18 Apr 2019

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Deep Speech: Scaling up end-to-end speech recognition

PaddlePaddle/PaddleSpeech 17 Dec 2014

We present a state-of-the-art speech recognition system developed using end-to-end deep learning.

Conformer: Convolution-augmented Transformer for Speech Recognition

PaddlePaddle/PaddleSpeech 16 May 2020

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

pytorch/fairseq NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

Recurrent Neural Network Regularization

wojzaremba/lstm 8 Sep 2014

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units.

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

autoliuweijie/FastBERT 8 Mar 2021

Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others.