Automatic Speech Recognition (ASR)

481 papers with code • 7 benchmarks • 23 datasets

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Libraries

Use these libraries to find Automatic Speech Recognition (ASR) models and implementations
13 papers
7,875
6 papers
10,062
See all 23 libraries.

Most implemented papers

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

carlini/audio_adversarial_examples 5 Jan 2018

We construct targeted audio adversarial examples on automatic speech recognition.

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation

qiujiali/lattice_rnn 30 Oct 2018

The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word.

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

MS-Mind/MS-Code-01 9 Nov 2019

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.

Multi-modal Dense Video Captioning

v-iashin/MDVC 17 Mar 2020

We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

kssteven418/squeezeformer 2 Jun 2022

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

wnhsu/FactorizedHierarchicalVAE NeurIPS 2017

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

kaldi-asr/kaldi 12 May 2018

We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

NVIDIA/OpenSeq2Seq 25 May 2018

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

Quaternion Recurrent Neural Networks

mravanelli/pytorch-kaldi ICLR 2019

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.