GOLOS

Introduced by Karpov et al. in Golos: Russian Dataset for Speech Research

Golos is a Russian speech dataset suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours.

Dataset structure

Domain Train files Train hours Test files Test hours
Crowd 979 796 1 095 9 994 11.2
Farfield 124 003 132.4 1 916 1.4
Total 1 103 799 1 227.4 11 910 12.6

Audio files in opus format

Archive Size Link
golos_opus.tar 20.5 GB https://sc.link/JpD

Audio files in wav format

Archives Size Links
train_farfield.tar 15.4 GB https://sc.link/1Z3
train_crowd0.tar 11 GB https://sc.link/Lrg
train_crowd1.tar 14 GB https://sc.link/MvQ
train_crowd2.tar 13.2 GB https://sc.link/NwL
train_crowd3.tar 11.6 GB https://sc.link/Oxg
train_crowd4.tar 15.8 GB https://sc.link/Pyz
train_crowd5.tar 13.1 GB https://sc.link/Qz7
train_crowd6.tar 15.7 GB https://sc.link/RAL
train_crowd7.tar 12.7 GB https://sc.link/VG5
train_crowd8.tar 12.2 GB https://sc.link/WJW
train_crowd9.tar 8.08 GB https://sc.link/XKk
test.tar 1.3 GB https://sc.link/Kqr

Evaluation

Percents of Word Error Rate for different test sets

Decoder \ Test set Crowd test Farfield test MCV<sup>1</sup> dev MCV<sup>1</sup> test
Greedy decoder 4.389 % 14.949 % 9.314 % 11.278 %
Beam Search with Common Crawl LM 4.709 % 12.503 % 6.341 % 7.976 %
Beam Search with Golos train set LM 3.548 % 12.384 % - -
Beam Search with Common Crawl and Golos LM 3.318 % 11.488 % 6.4 % 8.06 %

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


Modalities


Languages