Speech Recognition

1108 papers with code • 233 benchmarks • 88 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )


Use these libraries to find Speech Recognition models and implementations
16 papers
13 papers
11 papers
See all 16 libraries.

No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation

qiaoqiao2323/robot-speech-intelligibility 15 May 2024

For this, the robot needs to know how difficult it is for a user to understand spoken language in a particular setting.

15 May 2024

Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

zpforlove/rene 13 May 2024

In patient disease prediction tests on the ICBHI database, the architecture exhibited improvements of 23% in the mean of average score and harmonic score compared to the baseline.

13 May 2024

SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset

SoccerNet/sn-echoes 12 May 2024

The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics.

12 May 2024

Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models

rainavyas/prepend_acoustic_attack 9 May 2024

Our experiments demonstrate that the same, universal 0. 64-second adversarial audio segment can successfully mute a target Whisper ASR model for over 97\% of speech samples.

09 May 2024

Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems

SMIL-SPCRAS/AVCRFormer Expert Systems with Applications 2024

The article introduces a novel audio-visual speech command recognition transformer (AVCRFormer) specifically designed for robust AVSR.

09 May 2024

Open Implementation and Study of BEST-RQ for Speech Processing

speechbrain/speechbrain 7 May 2024

BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2. 0.

07 May 2024

Mixat: A Data Set of Bilingual Emirati-English Speech

mbzuai-nlp/mixat 4 May 2024

This paper introduces Mixat: a dataset of Emirati speech code-mixed with English.

04 May 2024

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

zzadiues3338/asr-energy-jetson 2 May 2024

By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs.

02 May 2024

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition

clearloveyuan/after 1 May 2024

To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency.

01 May 2024

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

badripatro/mamba360 24 Apr 2024

This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data.

24 Apr 2024