Golos is a Russian speech dataset suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours.
Dataset structure
Domain |
Train files |
Train hours |
Test files |
Test hours |
Crowd |
979 796 |
1 095 |
9 994 |
11.2 |
Farfield |
124 003 |
132.4 |
1 916 |
1.4 |
Total |
1 103 799 |
1 227.4 |
11 910 |
12.6 |
Audio files in opus format
Archive |
Size |
Link |
golos_opus.tar |
20.5 GB |
https://sc.link/JpD |
Audio files in wav format
Archives |
Size |
Links |
train_farfield.tar |
15.4 GB |
https://sc.link/1Z3 |
train_crowd0.tar |
11 GB |
https://sc.link/Lrg |
train_crowd1.tar |
14 GB |
https://sc.link/MvQ |
train_crowd2.tar |
13.2 GB |
https://sc.link/NwL |
train_crowd3.tar |
11.6 GB |
https://sc.link/Oxg |
train_crowd4.tar |
15.8 GB |
https://sc.link/Pyz |
train_crowd5.tar |
13.1 GB |
https://sc.link/Qz7 |
train_crowd6.tar |
15.7 GB |
https://sc.link/RAL |
train_crowd7.tar |
12.7 GB |
https://sc.link/VG5 |
train_crowd8.tar |
12.2 GB |
https://sc.link/WJW |
train_crowd9.tar |
8.08 GB |
https://sc.link/XKk |
test.tar |
1.3 GB |
https://sc.link/Kqr |
Evaluation
Percents of Word Error Rate for different test sets
Decoder \ Test set |
Crowd test |
Farfield test |
MCV<sup>1</sup> dev |
MCV<sup>1</sup> test |
Greedy decoder |
4.389 % |
14.949 % |
9.314 % |
11.278 % |
Beam Search with Common Crawl LM |
4.709 % |
12.503 % |
6.341 % |
7.976 % |
Beam Search with Golos train set LM |
3.548 % |
12.384 % |
- |
- |
Beam Search with Common Crawl and Golos LM |
3.318 % |
11.488 % |
6.4 % |
8.06 % |