Speech Recognition

1089 papers with code • 316 benchmarks • 87 datasets

Speech Recognition is the task of converting spoken language into text. It involves recognizing the words spoken in an audio recording and transcribing them into a written format. The goal is to accurately transcribe the speech in real-time or from recorded audio, taking into account factors such as accents, speaking speed, and background noise.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
16 papers
7,864
13 papers
44
11 papers
29,201
See all 16 libraries.

Language and Speech Technology for Central Kurdish Varieties

sinaahmadi/cordi 4 Mar 2024

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties.

8
04 Mar 2024

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

tbenst/silent_speech 2 Mar 2024

To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demonstrating that SSIs can be a viable alternative to automatic speech recognition (ASR).

4
02 Mar 2024

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

sally-sh/vsp-llm 23 Feb 2024

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements.

268
23 Feb 2024

HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention

chrischen1023/hint 22 Feb 2024

In this paper, we propose an end-to-end High-quality INpainting Transformer, abbreviated as HINT, which consists of a novel mask-aware pixel-shuffle downsampling module (MPD) to preserve the visible information extracted from the corrupted image while maintaining the integrity of the information available for high-level inferences made within the model.

14
22 Feb 2024

How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena

hlt-mt/fbk-fairseq 20 Feb 2024

The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity.

28
20 Feb 2024

DeepCover: Advancing RNN Test Coverage and Online Error Prediction using State Machine Extraction

pouriagr/deep-cover 10 Feb 2024

The proposed methodology along with its assessment metrics contribute to increasing explainability in RNN models by providing a clear representation of their internal decision making process through the extracted SM.

1
10 Feb 2024

Streaming Sequence Transduction through Dynamic Compression

steventan0110/star 2 Feb 2024

We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams.

1
02 Feb 2024

On Speaker Attribution with SURT

k2-fsa/icefall 28 Jan 2024

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR).

771
28 Jan 2024

Towards Event Extraction from Speech with Contextual Clues

jodie-kang/speechee 27 Jan 2024

While text-based event extraction has been an active research area and has seen successful application in many domains, extracting semantic events from speech directly is an under-explored problem.

0
27 Jan 2024