Speech Recognition

883 papers with code • 315 benchmarks • 195 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )


Use these libraries to find Speech Recognition models and implementations
14 papers
13 papers
11 papers
See all 19 libraries.

DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model

backspacetg/distilxlsr 2 Jun 2023

Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application.

02 Jun 2023

Improved DeepFake Detection Using Whisper Features

piotrkawa/deepfake-whisper-features 2 Jun 2023

With a recent influx of voice generation methods, the threat introduced by audio DeepFake (DF) is ever-increasing.

02 Jun 2023

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models

0xrutvij/slothspeech 1 Jun 2023

We show that popular ASR models like Speech2Text model and Whisper model have dynamic computation based on different inputs, causing dynamic efficiency.

01 Jun 2023

Perception and Semantic Aware Regularization for Sequential Confidence Calibration

husterpzh/pssr CVPR 2023

In this work, we find tokens/sequences with high perception and semantic correlations with the target ones contain more correlated and effective information and thus facilitate more effective regularization.

31 May 2023

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

briansidp/espnet 30 May 2023

The incorporation of biasing words obtained through contextual knowledge is of paramount importance in automatic speech recognition (ASR) applications.

30 May 2023

CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice

speechbrain/speechbrain 29 May 2023

We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7. 0 (English) and Common Voice 11. 0 (Italian, German, and Spanish).

29 May 2023

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

speechbrain/speechbrain 29 May 2023

In particular, multi-head HyperConformer achieves comparable or higher recognition performance while being more efficient than Conformer in terms of inference speed, memory, parameter count, and available training data.

29 May 2023

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

csikasote/bigc 26 May 2023

We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba.

26 May 2023

Unit-based Speech-to-Speech Translation Without Parallel Data

ajd12342/unit-speech-translation 24 May 2023

We propose an unsupervised speech-to-speech translation (S2ST) system that does not rely on parallel data between the source and target languages.

24 May 2023

Scaling Speech Technology to 1,000+ Languages

facebookresearch/fairseq arXiv 2023

Expanding the language coverage of speech technology has the potential to improve access to information for many more people.

23 May 2023