Speech Recognition

1093 papers with code • 234 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,892
13 papers
44
11 papers
29,286
See all 16 libraries.

Most implemented papers

Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

mailong25/vietnamese-speech-recognition arXiv 2016

This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding.

EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces

vlawhern/arl-eegmodels 23 Nov 2016

We introduce the use of depthwise and separable convolutions to construct an EEG-specific model which encapsulates well-known EEG feature extraction concepts for BCI.

Keyword Transformer: A Self-Attention Model for Keyword Spotting

ARM-software/keyword-transformer 1 Apr 2021

The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition.

Efficiently Modeling Long Sequences with Structured State Spaces

hazyresearch/state-spaces ICLR 2022

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

pytorch/fairseq Preprint 2022

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

PaddlePaddle/PaddleSpeech 21 Sep 2016

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.

wav2letter++: The Fastest Open-source Speech Recognition System

flashlight/wav2letter 18 Dec 2018

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

huggingface/transformers 14 Jun 2021

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation.

Sequence Transduction with Recurrent Neural Networks

TensorSpeech/TensorFlowASR 14 Nov 2012

One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating.