Speech Recognition

1089 papers with code • 316 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,858
13 papers
44
11 papers
29,201
See all 16 libraries.

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

akreal/bloomzmms 16 Apr 2024

Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.

0
16 Apr 2024

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

leduckhai/multimed 8 Apr 2024

VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration.

7
08 Apr 2024

CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models

neulab/cmulab 3 Apr 2024

Effectively using Natural Language Processing (NLP) tools in under-resourced languages requires a thorough understanding of the language itself, familiarity with the latest models and training methodologies, and technical expertise to deploy these models.

8
03 Apr 2024

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition

ahaliassos/raven 2 Apr 2024

In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data.

40
02 Apr 2024

Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal

gauthelo/kallaama-speech-dataset 2 Apr 2024

To build such technologies, we provide textual corpora in Wolof and Pulaar, and a pronunciation lexicon containing 49, 132 entries from the Wolof dataset.

3
02 Apr 2024

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

y0ngjaenius/cvpr2024_flowerformer 19 Mar 2024

The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution.

9
19 Mar 2024

SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages

ankilab/spoken-100 14 Mar 2024

Benchmarking plays a pivotal role in assessing and enhancing the performance of compact deep learning models designed for execution on resource-constrained devices, such as microcontrollers.

0
14 Mar 2024

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation

speechcolab/leaderboard 13 Mar 2024

In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation.

379
13 Mar 2024

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

uva-dsa/ems-pipeline 11 Mar 2024

Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making.

13
11 Mar 2024

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

dalision/modalbiasavsr 7 Mar 2024

In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason.

4
07 Mar 2024