880 papers with code • 5 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find speech-recognition models and implementations
15 papers
11 papers
10 papers
See all 20 libraries.

Most implemented papers

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

autoliuweijie/FastBERT 8 Mar 2021

Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others.

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

TensorSpeech/TensorFlowASR 7 May 2020

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

ISyNet: Convolutional Neural Networks design for AI accelerator

mindspore-ai/models 4 Sep 2021

To address this problem we propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method; and ISyNet - a set of architectures designed to be fast on the specialized neural processing unit (NPU) hardware and accurate at the same time.

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

facebookresearch/salina 3 Apr 2015

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients.

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

upskyy/Transformer-Transducer 7 Feb 2020

We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.

Unsupervised Cross-lingual Representation Learning for Speech Recognition

huggingface/transformers 24 Jun 2020

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

microsoft/unilm 26 Oct 2021

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.

Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

pytorch/fairseq Preprint 2022

While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind.

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

PaddlePaddle/PaddleSpeech 12 Aug 2014

This approach to decoding enables first-pass speech recognition with a language model, completely unaided by the cumbersome infrastructure of HMM-based systems.