Speech Representation Learning

46 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Speech Representation Learning models and implementations

Most implemented papers

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

huggingface/transformers 14 Jun 2021

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation.

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning 25 Oct 2019

We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.

Unsupervised speech representation learning using WaveNet autoencoders

bshall/ZeroSpeech 25 Jan 2019

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms.

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

cywang97/unispeech 19 Jan 2021

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.

An Unsupervised Autoregressive Model for Speech Representation Learning

iamyuanchung/Autoregressive-Predictive-Coding 5 Apr 2019

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

facebookresearch/fairseq 7 Aug 2021

In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.

Sampling strategies in Siamese Networks for unsupervised speech representation learning

bootphon/abnet3 30 Apr 2018

We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

pytorch/fairseq 17 Nov 2021

On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

facebookresearch/av_hubert ICLR 2022

The lip-reading WER is further reduced to 26. 9% when using all 433 hours of labeled data from LRS3 and combined with self-training.