Speech Recognition
1108 papers with code • 233 benchmarks • 88 datasets
Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.
( Image credit: SpecAugment )
Libraries
Use these libraries to find Speech Recognition models and implementationsDatasets
Subtasks
Most implemented papers
LSTM: A Search Space Odyssey
Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995.
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices.
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions
We propose a new end-to-end neural acoustic model for automatic speech recognition.
Attention-Based Models for Speech Recognition
Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
We consider the two related problems of detecting if an example is misclassified or out-of-distribution.
Improved training of end-to-end attention models for speech recognition
Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.
The PyTorch-Kaldi Speech Recognition Toolkit
Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.
CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition
In this work, we combine freely available corpora for German speech recognition, including yet unlabeled speech data, to a big dataset of over $1700$h of speech data.
Jasper: An End-to-End Convolutional Neural Acoustic Model
In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.