gtzan_music_speech is a dataset for music/speech discrimination. It consists of 120 tracks of 30 second length. Each class (music/speech) has 60 samples. The tracks are all 22050Hz Mono 16-bit audio files in .wav format.
Paper | Code | Results | Date | Stars |
---|