Automatic Speech Recognition

500 papers with code • 158 benchmarks • 11 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Automatic Speech Recognition models and implementations
13 papers
7,867
8 papers
10,045
5 papers
771
See all 17 libraries.

Most implemented papers

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

carlini/audio_adversarial_examples 5 Jan 2018

We construct targeted audio adversarial examples on automatic speech recognition.

Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation

qiujiali/lattice_rnn 30 Oct 2018

The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word.

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

MS-Mind/MS-Code-01 9 Nov 2019

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.

Multi-modal Dense Video Captioning

v-iashin/MDVC 17 Mar 2020

We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

kssteven418/squeezeformer 2 Jun 2022

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

wnhsu/FactorizedHierarchicalVAE NeurIPS 2017

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

kaldi-asr/kaldi 12 May 2018

We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

NVIDIA/OpenSeq2Seq 25 May 2018

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

Quaternion Recurrent Neural Networks

mravanelli/pytorch-kaldi ICLR 2019

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

georgesterpu/Sigmedia-AVSR 5 Sep 2018

Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise.