Speech Recognition

691 papers with code • 275 benchmarks • 180 datasets

Speech recognition is the task of recognising speech within audio and converting it into text.

( Image credit: SpecAugment )

Libraries

Use these libraries to find Speech Recognition models and implementations
10 papers
17,354
7 papers
5,225
See all 12 libraries.

Distilling a Pretrained Language Model to a Multilingual ASR Model

juice500ml/xlm_to_xlsr 25 Jun 2022

Hence, we are motivated to distill the rich knowledge embedded inside a well-trained teacher text model to the student speech model.

1
25 Jun 2022

TEVR: Improving Speech Recognition by Token Entropy Variance Reduction

fxtentacle/wav2vec2-xls-r-1b-tevr 25 Jun 2022

This paper presents TEVR, a speech recognition model designed to minimize the variation in token entropy w. r. t.

0
25 Jun 2022

SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

facebookresearch/sound-spaces 16 Jun 2022

We introduce SoundSpaces 2. 0, a platform for on-the-fly geometry-based audio rendering for 3D environments.

196
16 Jun 2022

Revisiting End-to-End Speech-to-Text Translation From Scratch

bzhangGo/zero 9 Jun 2022

Finally, we discuss neural acoustic feature modeling, where a neural model is designed to extract acoustic features from raw speech signals directly, with the goal to simplify inductive biases and add freedom to the model in describing speech.

115
09 Jun 2022

NNTrainer: Light-Weight On-Device Training Framework

nnstreamer/nntrainer 9 Jun 2022

Vendors have recently started to execute intelligence services on devices to preserve personal data in devices, reduce network and cloud costs.

81
09 Jun 2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

jctian98/e2e_lfmmi 5 Jun 2022

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

121
05 Jun 2022

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

chorowski-lab/hcpc 5 Jun 2022

The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones.

2
05 Jun 2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

kssteven418/squeezeformer 2 Jun 2022

After reexamining the design choices for both the macro and micro-architecture of Conformer, we propose the Squeezeformer model, which consistently outperforms the state-of-the-art ASR models under the same training schemes.

93
02 Jun 2022

Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models

sabeesh90/MosaicML_Augmentations_Efficient_Deep_Learning_MLDS_2022 23 May 2022

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more.

1
23 May 2022

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

mikewangwzhl/vidil 22 May 2022

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.

31
22 May 2022