Automatic Speech Recognition
500 papers with code • 158 benchmarks • 11 datasets
Libraries
Use these libraries to find Automatic Speech Recognition models and implementationsDatasets
Most implemented papers
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
We construct targeted audio adversarial examples on automatic speech recognition.
Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation
The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word.
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.
Multi-modal Dense Video Captioning
We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.
TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation
We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.
Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq
We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.
Quaternion Recurrent Neural Networks
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise.