TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Emotion Recognition	IEMOCAP	Partially Fine-tuned HuBERT Large	WA	0.796	# 5
Slot Filling	SLURP	Partially Fine-tuned HuBERT	F1	0.753	# 2
Intent Classification	SLURP	Partially Fine-tuned HuBERT	Accuracy (%)	87.51	# 2
Speaker Verification	VoxCeleb1	Fine-tuned HuBERT Large	EER	2.36	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-fine-tuned-wav2vec-2-0-hubert-benchmark-for/slot-filling-on-slurp)](https://paperswithcode.com/sota/slot-filling-on-slurp?p=a-fine-tuned-wav2vec-2-0-hubert-benchmark-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-fine-tuned-wav2vec-2-0-hubert-benchmark-for/intent-classification-on-slurp)](https://paperswithcode.com/sota/intent-classification-on-slurp?p=a-fine-tuned-wav2vec-2-0-hubert-benchmark-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-fine-tuned-wav2vec-2-0-hubert-benchmark-for/speaker-verification-on-voxceleb1)](https://paperswithcode.com/sota/speaker-verification-on-voxceleb1?p=a-fine-tuned-wav2vec-2-0-hubert-benchmark-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-fine-tuned-wav2vec-2-0-hubert-benchmark-for/speech-emotion-recognition-on-iemocap)](https://paperswithcode.com/sota/speech-emotion-recognition-on-iemocap?p=a-fine-tuned-wav2vec-2-0-hubert-benchmark-for)`

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

4 Nov 2021 · Yingzhi Wang, Abdelmoumene Boumadane, Abdelwahab Heba ·

Speech self-supervised models such as wav2vec 2.0 and HuBERT are making revolutionary progress in Automatic Speech Recognition (ASR). However, they have not been totally proven to produce better performance on tasks other than ASR. In this work, we explored partial fine-tuning and entire fine-tuning on wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. With simple proposed downstream frameworks, the best scores reached 79.58% weighted accuracy on speaker-dependent setting and 73.01% weighted accuracy on speaker-independent setting for Speech Emotion Recognition on IEMOCAP, 2.36% equal error rate for Speaker Verification on VoxCeleb1, 89.38% accuracy for Intent Classification and 78.92% F1 for Slot Filling on SLURP, showing the strength of fine-tuned wav2vec 2.0 and HuBERT on learning prosodic, voice-print and semantic representations.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Emotion Recognition

intent-classification

Intent Classification

slot-filling

Slot Filling

Speaker Verification

Speech Emotion Recognition

speech-recognition

Speech Recognition

Spoken Language Understanding

Datasets

LibriSpeech

IEMOCAP

VoxCeleb1 SLURP

Results from the Paper

Edit

Ranked #2 on Speaker Verification on VoxCeleb1

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Emotion Recognition	IEMOCAP	Partially Fine-tuned HuBERT Large	WA	0.796	# 5	Compare
Slot Filling	SLURP	Partially Fine-tuned HuBERT	F1	0.753	# 2	Compare
Intent Classification	SLURP	Partially Fine-tuned HuBERT	Accuracy (%)	87.51	# 2	Compare
Speaker Verification	VoxCeleb1	Fine-tuned HuBERT Large	EER	2.36	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove