Speech Recognition

1089 papers with code • 316 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,858
13 papers
44
11 papers
29,192
See all 16 libraries.

Most implemented papers

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

snipsco/snips-nlu 25 May 2018

This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices.

LSTM: A Search Space Odyssey

flukeskywalker/highway-networks 13 Mar 2015

Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995.

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

NVIDIA/NeMo 22 Oct 2019

We propose a new end-to-end neural acoustic model for automatic speech recognition.

Attention-Based Models for Speech Recognition

Alexander-H-Liu/End-to-end-ASR-Pytorch NeurIPS 2015

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

hendrycks/error-detection 7 Oct 2016

We consider the two related problems of detecting if an example is misclassified or out-of-distribution.

Improved training of end-to-end attention models for speech recognition

rwth-i6/returnn 8 May 2018

Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.

The PyTorch-Kaldi Speech Recognition Toolkit

mravanelli/pytorch-kaldi 19 Nov 2018

Experiments, that are conducted on several datasets and tasks, show that PyTorch-Kaldi can effectively be used to develop modern state-of-the-art speech recognizers.

CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition

cornerfarmer/ctc_segmentation 17 Jul 2020

In this work, we combine freely available corpora for German speech recognition, including yet unlabeled speech data, to a big dataset of over $1700$h of speech data.

Jasper: An End-to-End Convolutional Neural Acoustic Model

osmr/imgclsmob 5 Apr 2019

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

theamrzaki/text_summurization_abstractive_methods NeurIPS 2015

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.