Search Results

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

24 code implementations8 Dec 2015

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

Accented Speech Recognition End-To-End Speech Recognition +1

Unsupervised Cross-lingual Representation Learning for Speech Recognition

3 code implementations24 Jun 2020

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +1

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

2 code implementations Asian Chapter of the Association for Computational Linguistics 2020

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.

End-To-End Speech Recognition Machine Translation +3

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

11 code implementations NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

Quantization Self-Supervised Learning +1

Tacotron: Towards End-to-End Speech Synthesis

23 code implementations29 Mar 2017

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Speech Synthesis Text-To-Speech Synthesis

Unsupervised Speech Recognition

1 code implementation24 May 2021

Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe.

Speech Recognition Unsupervised Speech Recognition

A Spectral Energy Distance for Parallel Speech Synthesis

2 code implementations NeurIPS 2020

Speech synthesis is an important practical generative modeling problem that has seen great progress over the last few years, with likelihood-based autoregressive neural models now outperforming traditional concatenative systems.

Speech Synthesis

FRILL: A Non-Semantic Speech Embedding for Mobile Devices

1 code implementation9 Nov 2020

In this work, we propose a class of lightweight non-semantic speech embedding models that run efficiently on mobile devices based on the recently proposed TRILL speech embedding.

Sound Audio and Speech Processing