TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Resynthesis	LibriSpeech	CPC	PER	14.23	# 2
Resynthesis	LibriSpeech	CPC	CER	8.29	# 1
Resynthesis	LibriSpeech	CPC	MOS	3.54	# 1
Resynthesis	LibriSpeech	HuBERT-L6	PER	16.68	# 1
Resynthesis	LibriSpeech	HuBERT-L6	CER	11.85	# 2
Resynthesis	LibriSpeech	HuBERT-L6	MOS	3.49	# 2
Resynthesis	LJSpeech	CPC	PER	8.74	# 2
Resynthesis	LJSpeech	CPC	CER	9.20	# 1
Resynthesis	LJSpeech	CPC	MOS	3.85	# 1
Resynthesis	LJSpeech	HuBERT-L6	PER	11.45	# 1
Resynthesis	LJSpeech	HuBERT-L6	CER	11.02	# 2
Resynthesis	LJSpeech	HuBERT-L6	MOS	3.69	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/generative-spoken-language-modeling-from-raw/resynthesis-on-librispeech-1)](https://paperswithcode.com/sota/resynthesis-on-librispeech-1?p=generative-spoken-language-modeling-from-raw)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/generative-spoken-language-modeling-from-raw/resynthesis-on-ljspeech)](https://paperswithcode.com/sota/resynthesis-on-ljspeech?p=generative-spoken-language-modeling-from-raw)`

Generative Spoken Language Modeling from Raw Audio

1 Feb 2021 · Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux ·

We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation. We set up baseline systems consisting of a discrete speech encoder (returning pseudo-text units), a generative language model (trained on pseudo-text), and a speech decoder (generating a waveform from pseudo-text) all trained without supervision and validate the proposed metrics with human evaluation. Across 3 speech encoders (CPC, wav2vec 2.0, HuBERT), we find that the number of discrete units (50, 100, or 200) matters in a task-dependent and encoder-dependent way, and that some combinations approach text-based systems.

PDF Abstract

Code

Add Remove Mark official

pytorch/fairseq official

29,243

ga642381/SpeechPrompt

Tasks

Add Remove

Language Modelling

Resynthesis

Datasets

LibriSpeech

LJSpeech Libri-Light

Results from the Paper

Edit

Ranked #1 on Resynthesis on LibriSpeech

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Resynthesis	LibriSpeech	CPC	PER	14.23	# 2	Compare
			CER	8.29	# 1	Compare
			MOS	3.54	# 1	Compare
Resynthesis	LibriSpeech	HuBERT-L6	PER	16.68	# 1	Compare
			CER	11.85	# 2	Compare
			MOS	3.49	# 2	Compare
Resynthesis	LJSpeech	CPC	PER	8.74	# 2	Compare
			CER	9.20	# 1	Compare
			MOS	3.85	# 1	Compare
Resynthesis	LJSpeech	HuBERT-L6	PER	11.45	# 1	Compare
			CER	11.02	# 2	Compare
			MOS	3.69	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Generative Spoken Language Modeling from Raw Audio

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove