TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Spoken language identification	KALAKA-3	Model on the noisy data	PC	0.055	# 2
Spoken language identification	KALAKA-3	Model on the noisy data	PO	0.083	# 2
Spoken language identification	KALAKA-3	Model on the noisy data	EC	0.033	# 2
Spoken language identification	KALAKA-3	Model on the noisy data	EO	0.059	# 2
Spoken language identification	KALAKA-3	Model on the automatically filtered (cleaned) data	PC	0.041	# 1
Spoken language identification	KALAKA-3	Model on the automatically filtered (cleaned) data	PO	0.056	# 1
Spoken language identification	KALAKA-3	Model on the automatically filtered (cleaned) data	EC	0.022	# 1
Spoken language identification	KALAKA-3	Model on the automatically filtered (cleaned) data	EO	0.058	# 1
Spoken language identification	LRE07	Kaldi i-vector	3 sec	26.04	# 9
Spoken language identification	LRE07	Kaldi i-vector	10 sec	11.93	# 9
Spoken language identification	LRE07	Kaldi i-vector	30 sec	4.52	# 9
Spoken language identification	LRE07	Kaldi i-vector	Average	14.17	# 9
Spoken language identification	LRE07	Kaldi i-vector DNN	3 sec	19.67	# 8
Spoken language identification	LRE07	Kaldi i-vector DNN	10 sec	7.84	# 8
Spoken language identification	LRE07	Kaldi i-vector DNN	30 sec	3.31	# 8
Spoken language identification	LRE07	Kaldi i-vector DNN	Average	10.27	# 8
Spoken language identification	LRE07	GMM-MMI	3 sec	17.28	# 6
Spoken language identification	LRE07	GMM-MMI	10 sec	5.90	# 6
Spoken language identification	LRE07	GMM-MMI	30 sec	2.10	# 7
Spoken language identification	LRE07	GMM-MMI	Average	8.42	# 6
Spoken language identification	LRE07	CNN-SAP	3 sec	8.59	# 2
Spoken language identification	LRE07	CNN-SAP	10 sec	2.49	# 1
Spoken language identification	LRE07	CNN-SAP	30 sec	1.09	# 1
Spoken language identification	LRE07	CNN-SAP	Average	4.06	# 2
Spoken language identification	LRE07	CNN-LDE	3 sec	8.25	# 1
Spoken language identification	LRE07	CNN-LDE	10 sec	2.61	# 2
Spoken language identification	LRE07	CNN-LDE	30 sec	1.16	# 2
Spoken language identification	LRE07	CNN-LDE	Average	4.00	# 1
Spoken language identification	LRE07	Resnet34 (cleaned data)	3 sec	9.39	# 3
Spoken language identification	LRE07	Resnet34 (cleaned data)	10 sec	3.14	# 3
Spoken language identification	LRE07	Resnet34 (cleaned data)	30 sec	1.90	# 6
Spoken language identification	LRE07	Resnet34 (cleaned data)	Average	4.81	# 3
Spoken language identification	LRE07	Resnet34 (noisy data)	3 sec	10.58	# 4
Spoken language identification	LRE07	Resnet34 (noisy data)	10 sec	3.33	# 4
Spoken language identification	LRE07	Resnet34 (noisy data)	30 sec	1.72	# 5
Spoken language identification	LRE07	Resnet34 (noisy data)	Average	5.21	# 4
Spoken language identification	LRE07	Fusion of models	3 sec	15.29	# 5
Spoken language identification	LRE07	Fusion of models	10 sec	4.54	# 5
Spoken language identification	LRE07	Fusion of models	30 sec	1.30	# 3
Spoken language identification	LRE07	Fusion of models	Average	7.04	# 5
Spoken language identification	LRE07	Phonotactic	3 sec	18.59	# 7
Spoken language identification	LRE07	Phonotactic	10 sec	6.28	# 7
Spoken language identification	LRE07	Phonotactic	30 sec	1.34	# 4
Spoken language identification	LRE07	Phonotactic	Average	8.73	# 7
Spoken language identification	VOXLINGUA107	Cleaned	0..5sec	13.4	# 2
Spoken language identification	VOXLINGUA107	Cleaned	5..20sec	6.6	# 2
Spoken language identification	VOXLINGUA107	Cleaned	Average	7.6	# 2
Spoken language identification	VOXLINGUA107	Noisy	0..5sec	12.3	# 1
Spoken language identification	VOXLINGUA107	Noisy	5..20sec	6.1	# 1
Spoken language identification	VOXLINGUA107	Noisy	Average	7.1	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/voxlingua107-a-dataset-for-spoken-language/spoken-language-identification-on-kalaka-3)](https://paperswithcode.com/sota/spoken-language-identification-on-kalaka-3?p=voxlingua107-a-dataset-for-spoken-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/voxlingua107-a-dataset-for-spoken-language/spoken-language-identification-on-lre07)](https://paperswithcode.com/sota/spoken-language-identification-on-lre07?p=voxlingua107-a-dataset-for-spoken-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/voxlingua107-a-dataset-for-spoken-language/spoken-language-identification-on)](https://paperswithcode.com/sota/spoken-language-identification-on?p=voxlingua107-a-dataset-for-spoken-language)`

VOXLINGUA107: A DATASET FOR SPOKEN LANGUAGE RECOGNITION

25 Nov 2020 · Jorgen Valk, Tanel Alumae ·

This paper investigates the use of automatically collected web audio data for the task of spoken language recognition. We generate semi-random search phrases from language-specific Wikipedia data that are then used to retrieve videos from YouTube for 107 languages. Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech. Post-filtering is used to remove segments from the database that are likely not in the given language, increasing the proportion of correctly labeled segments to 98%, based on crowd-sourced verification. The size of the resulting training set (VoxLingua107) is 6628 hours (62 hours per language on the average) and it is accompanied by an evaluation set of 1609 verified utterances. We use the data to build language recognition models for several spoken language identification tasks. Experiments show that using the automatically retrieved training data gives competitive results to using hand-labeled proprietary datasets. The dataset is publicly available.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Detection

Activity Detection

Language Identification

speaker-diarization

Speaker Diarization

Spoken language identification

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Ranked #1 on Spoken language identification on KALAKA-3

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Spoken language identification	KALAKA-3	Model on the noisy data	PC	0.055	# 2	Compare
			PO	0.083	# 2	Compare
			EC	0.033	# 2	Compare
			EO	0.059	# 2	Compare
Spoken language identification	KALAKA-3	Model on the automatically filtered (cleaned) data	PC	0.041	# 1	Compare
			PO	0.056	# 1	Compare
			EC	0.022	# 1	Compare
			EO	0.058	# 1	Compare
Spoken language identification	LRE07	Kaldi i-vector	3 sec	26.04	# 9	Compare
			10 sec	11.93	# 9	Compare
			30 sec	4.52	# 9	Compare
			Average	14.17	# 9	Compare
Spoken language identification	LRE07	Kaldi i-vector DNN	3 sec	19.67	# 8	Compare
			10 sec	7.84	# 8	Compare
			30 sec	3.31	# 8	Compare
			Average	10.27	# 8	Compare
Spoken language identification	LRE07	GMM-MMI	3 sec	17.28	# 6	Compare
			10 sec	5.90	# 6	Compare
			30 sec	2.10	# 7	Compare
			Average	8.42	# 6	Compare
Spoken language identification	LRE07	CNN-SAP	3 sec	8.59	# 2	Compare
			10 sec	2.49	# 1	Compare
			30 sec	1.09	# 1	Compare
			Average	4.06	# 2	Compare
Spoken language identification	LRE07	CNN-LDE	3 sec	8.25	# 1	Compare
			10 sec	2.61	# 2	Compare
			30 sec	1.16	# 2	Compare
			Average	4.00	# 1	Compare
Spoken language identification	LRE07	Resnet34 (cleaned data)	3 sec	9.39	# 3	Compare
			10 sec	3.14	# 3	Compare
			30 sec	1.90	# 6	Compare
			Average	4.81	# 3	Compare
Spoken language identification	LRE07	Resnet34 (noisy data)	3 sec	10.58	# 4	Compare
			10 sec	3.33	# 4	Compare
			30 sec	1.72	# 5	Compare
			Average	5.21	# 4	Compare
Spoken language identification	LRE07	Fusion of models	3 sec	15.29	# 5	Compare
			10 sec	4.54	# 5	Compare
			30 sec	1.30	# 3	Compare
			Average	7.04	# 5	Compare
Spoken language identification	LRE07	Phonotactic	3 sec	18.59	# 7	Compare
			10 sec	6.28	# 7	Compare
			30 sec	1.34	# 4	Compare
			Average	8.73	# 7	Compare
Spoken language identification	VOXLINGUA107	Cleaned	0..5sec	13.4	# 2	Compare
			5..20sec	6.6	# 2	Compare
			Average	7.6	# 2	Compare
Spoken language identification	VOXLINGUA107	Noisy	0..5sec	12.3	# 1	Compare
			5..20sec	6.1	# 1	Compare
			Average	7.1	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

VOXLINGUA107: A DATASET FOR SPOKEN LANGUAGE RECOGNITION

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove