CoVoST is a large-scale multilingual speech-to-text translation corpus. Its latest 2nd version covers translations from 21 languages into English and from English into 15 languages. It has total 2880 hours of speech and is diversified with 78K speakers and 66 accents.
32 PAPERS • NO BENCHMARKS YET
Data collection was conducted by asking some adults from social media and some students from an elementary school to participate in our experiment. Table.1 shows the number of data gathered for recognizing each color. Due to the fact that two words are used for black in Persian, the number of black samples is more. In addition, because the color recognition is a RAN task, a sequence of data has been gathered. Table.2 depicts the number of sequence data for colors. For the meaningless words, 12 voices have been gathered on average for each word (there are 40 meaningless words in this task).
1 PAPER • NO BENCHMARKS YET