Speech Recognition

1087 papers with code • 316 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,851
13 papers
44
11 papers
29,183
See all 16 libraries.

Latest papers with no code

Noise Masking Attacks and Defenses for Pretrained Speech Models

no code yet • 2 Apr 2024

Our method fine-tunes the encoder to produce an ASR model, and then performs noise masking on this model, which we find recovers private information from the pretraining data, despite the model never having seen transcripts at pretraining time!

Transfer Learning from Whisper for Microscopic Intelligibility Prediction

no code yet • 2 Apr 2024

Macroscopic intelligibility models predict the expected human word-error-rate for a given speech-in-noise stimulus.

Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models

no code yet • 31 Mar 2024

We identify subgroups of audio recordings based on combinations of these metadata and compute each subgroup's performance (e. g., Word Error Rate) and the difference in performance (''divergence'') w. r. t the overall population.

ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models

no code yet • 29 Mar 2024

Our experiments with recent long-context LLMs on ELITR-Bench highlight a gap between open-source and proprietary models, especially when questions are asked sequentially within a conversation.

LV-CTC: Non-autoregressive ASR with CTC and latent variable models

no code yet • 28 Mar 2024

In this paper, we propose a new model combining CTC and a latent variable model, which is one of the state-of-the-art models in the neural machine translation research field.

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

no code yet • 28 Mar 2024

Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks.

ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus

no code yet • 27 Mar 2024

We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus.

DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition

no code yet • 26 Mar 2024

End-to-end automatic speech recognition (E2E ASR) systems often suffer from mistranscription of domain-specific phrases, such as named entities, sometimes leading to catastrophic failures in downstream tasks.

Extracting Biomedical Entities from Noisy Audio Transcripts

no code yet • 26 Mar 2024

Our dataset offers a comprehensive collection of almost 2, 000 clean and noisy recordings.