Speech Recognition

1089 papers with code • 316 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Recognition

Dataset	Best Model	Compare
LibriSpeech test-clean	Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light	See all
LibriSpeech test-other	parakeet-rnnt-1.1b	See all
TIMIT	wav2vec 2.0	See all
Switchboard + Hub500	IBM (LSTM+Conformer encoder-decoder)	See all
WSJ eval92	Speechstew 100M	See all
Common Voice German	wav2vec 2.0 XLS-R 1B + TEVR (5-gram)	See all
AISHELL-1	Qwen-Audio	See all
Common Voice Spanish	ConformerCTC-L (4-gram)	See all
Common Voice French	ConformerCTC-L (5-gram)	See all
MediaSpeech	Quartznet	See all
TUDA	Conformer-Transducer (no LM)	See all
SLUE	W2V2-L-LL60K (+ TED-LIUM 3 LM)	See all
VietMed	XLSR-53-Viet	See all
swb_hub_500 WER fullSWBCH	IBM (LSTM+Conformer encoder-decoder)	See all
WenetSpeech	Paraformer-large	See all
Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard mild policy)	See all
Libri-Light test-clean	wav2vec 2.0 Large-10h-LV-60k	See all
Libri-Light test-other	wav2vec 2.0 Large-10h-LV-60k	See all
EasyCom	ReVISE (bf)	See all
CHiME-6 dev_gss12	ConformerXXL-PS + G-Augment	See all
WSJ dev93	CTC-CRF ST-NAS	See all
WSJ eval93	Deep Speech 2	See all
Fongbe audio	Triphone (39 features) + LDA and MLLT + SGMM	See all
CHiME-6 eval	ConformerXXL-PS + G-Augment	See all
Common Voice Italian	Whisper (Large v2)	See all
SPGISpeech	Icefall - zipformer transducer	See all
Common Voice	ConformerXXL-P + Downstream NST	See all
Common Voice English	parakeet-rnnt-1.1b	See all
VIVOS	Vietnamese end-to-end speech recognition using wav2vec 2.0 by VietAI	See all
Common Voice vi	wav2vec2-base-vietnamese-160h (No Language Model)	See all
AMI IMH	ConformerXXL-P + Downstream NST	See all
AMI SDM1	ConformerXXL-P	See all
Tedlium	parakeet-rnnt-1.1b	See all
Europarl-ASR EN Guest-test	mllp_2021_offline_verb	See all
Europarl-ASR EN MEP-test	mllp_2021_offline_filt	See all
Speech Commands	Liquid-S4	See all
LRS3-TED	AV-HuBERT Large	See all
LibriCSS	TS-SEP	See all
AISHELL-2	Paraformer-large	See all
Switchboard (300hr)	End-to-end LF-MMI	See all
Hub5'00 CallHome	Espresso	See all
Hub5'00 FISHER-SWBD	CTC-CRF	See all
LibriSpeech train-clean-100 test-clean	wav2vec_wav2letter	See all
LibriSpeech train-clean-100 test-other	wav2vec_wav2letter	See all
Common Voice Portuguese	XLSR53 Wav2Vec2 Portuguese by Orlem Santos	See all
Common Voice Frisian	wav2vec2-large-xls-r-1b-frisian	See all
GigaSpeech	Conformer/Transformer-AED	See all
GigaSpeech DEV	Conformer/Transformer-AED	See all
GigaSpeech TEST	Conformer/Transformer-AED	See all
Switchboard SWBD	SpeechStew (100M)	See all
Switchboard CallHome	SpeechStew (100M)	See all
Google Speech Commands - Musan	ImportantAug	See all
LRS2	RAVEn Large	See all
CALLHOME Spanish Speech	TDT 0-2	See all
facebook/multilingual_librispeech german	TDT 0-4	See all
AISHELL-2 Test IOS	Qwen-Audio	See all
AISHELL-2 Test Mic	Qwen-Audio	See all
AISHELL-2 Test Android	Qwen-Audio	See all
TED-LIUM	ConformerXXL-PS	See all

Show all 315 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Recognition models and implementations

espnet/espnet

16 papers

7,858

msalhab96/SpeeQ

13 papers

PaddlePaddle/PaddleSpeech

12 papers

10,095

pytorch/fairseq

11 papers

29,201

See all 16 libraries.

Datasets

Subtasks

Distant Speech Recognition

Target Speaker Extraction

Sequence-To-Sequence Speech Recognition

Automatic Lyrics Transcription

Accented Speech Recognition

Noisy Speech Recognition

English Conversational Speech Recognition

Latest papers

Most implemented Social Latest No code

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

akreal/bloomzmms • • 16 Apr 2024

Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.

16 Apr 2024

Paper
Code

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

leduckhai/multimed • 8 Apr 2024

VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration.

08 Apr 2024

Paper
Code

CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models

neulab/cmulab • 3 Apr 2024

Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models.

03 Apr 2024

Paper
Code

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition

ahaliassos/raven • • 2 Apr 2024

In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data.

02 Apr 2024

Paper
Code

Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal

gauthelo/kallaama-speech-dataset • 2 Apr 2024

To build such technologies, we provide textual corpora in Wolof and Pulaar, and a pronunciation lexicon containing 49, 132 entries from the Wolof dataset.

02 Apr 2024

Paper
Code

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

y0ngjaenius/cvpr2024_flowerformer • • 19 Mar 2024

The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution.

19 Mar 2024

Paper
Code

SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages

ankilab/spoken-100 • • 14 Mar 2024

Benchmarking plays a pivotal role in assessing and enhancing the performance of compact deep learning models designed for execution on resource-constrained devices, such as microcontrollers.

14 Mar 2024

Paper
Code

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation

speechcolab/leaderboard • • 13 Mar 2024

In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation.

379

13 Mar 2024

Paper
Code

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

uva-dsa/ems-pipeline • • 11 Mar 2024

Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making.

11 Mar 2024

Paper
Code

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

dalision/modalbiasavsr • 7 Mar 2024

In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason.

07 Mar 2024

Paper
Code

Speech Recognition

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result