Speech Recognition
1093 papers with code • 234 benchmarks • 87 datasets
Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.
( Image credit: SpecAugment )
Libraries
Use these libraries to find Speech Recognition models and implementationsDatasets
Subtasks
Most implemented papers
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding.
EEGNet: A Compact Convolutional Network for EEG-based Brain-Computer Interfaces
We introduce the use of depthwise and separable convolutions to construct an EEG-specific model which encapsulates well-known EEG feature extraction concepts for BCI.
Keyword Transformer: A Self-Attention Model for Keyword Spotting
The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition.
Efficiently Modeling Long Sequences with Structured State Spaces
A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.
Robust Speech Recognition via Large-Scale Weak Supervision
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.
wav2letter++: The Fastest Open-source Speech Recognition System
This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation.
Sequence Transduction with Recurrent Neural Networks
One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating.