TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Phone-level pronunciation scoring	speechocean762	GOPT-PAII	Pearson correlation coefficient (PCC)	0.68	# 2
Utterance-level pronounciation scoring	speechocean762	GOPT-PAII	Pearson correlation coefficient (PCC)	0.73	# 4
Utterance-level pronounciation scoring	speechocean762	GOPT-Librispeech	Pearson correlation coefficient (PCC)	0.74	# 3
Word-level pronunciation scoring	speechocean762	GOPT-PAII	Pearson correlation coefficient (PCC)	0.60	# 2
Word-level pronunciation scoring	speechocean762	GOPT-Librispeech	Pearson correlation coefficient (PCC)	0.55	# 4
Phone-level pronunciation scoring	speechocean762	GOPT-Librispeech	Pearson correlation coefficient (PCC)	0.61	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transformer-based-multi-aspect-multi/phone-level-pronunciation-scoring-on)](https://paperswithcode.com/sota/phone-level-pronunciation-scoring-on?p=transformer-based-multi-aspect-multi)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transformer-based-multi-aspect-multi/word-level-pronunciation-scoring-on)](https://paperswithcode.com/sota/word-level-pronunciation-scoring-on?p=transformer-based-multi-aspect-multi)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transformer-based-multi-aspect-multi/utterance-level-pronounciation-scoring-on)](https://paperswithcode.com/sota/utterance-level-pronounciation-scoring-on?p=transformer-based-multi-aspect-multi)`

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

6 May 2022 · Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, James Glass ·

Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. Specifically, we train a Goodness Of Pronunciation feature-based Transformer (GOPT) with multi-task learning. Experiments show that GOPT achieves the best results on speechocean762 with a public automatic speech recognition (ASR) acoustic model trained on Librispeech.

PDF Abstract

Code

Add Remove Mark official

YuanGongND/gopt official

↳ Quickstart in

Colab

120

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Multi-Task Learning

Phone-level pronunciation scoring

speech-recognition

Speech Recognition

Utterance-level pronounciation scoring

Word-level pronunciation scoring

Datasets

LibriSpeech speechocean762

Results from the Paper

Edit

Ranked #2 on Phone-level pronunciation scoring on speechocean762 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Phone-level pronunciation scoring	speechocean762	GOPT-PAII	Pearson correlation coefficient (PCC)	0.68	# 2	Compare
Utterance-level pronounciation scoring	speechocean762	GOPT-PAII	Pearson correlation coefficient (PCC)	0.73	# 4	Compare
Utterance-level pronounciation scoring	speechocean762	GOPT-Librispeech	Pearson correlation coefficient (PCC)	0.74	# 3	Compare
Word-level pronunciation scoring	speechocean762	GOPT-PAII	Pearson correlation coefficient (PCC)	0.60	# 2	Compare
Word-level pronunciation scoring	speechocean762	GOPT-Librispeech	Pearson correlation coefficient (PCC)	0.55	# 4	Compare
Phone-level pronunciation scoring	speechocean762	GOPT-Librispeech	Pearson correlation coefficient (PCC)	0.61	# 6	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove