Here we release the dataset (Multi_Channel_Grid, abbreviated as MC_Grid) used in our paper LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION.
MC_Grid, which is based on GRID dataset, includes multi-channel audio, extracted voiceprint and visual feature. The method of feature extraction will be introduced below.
MC_Grid is specially prepared for speaker extraction task, and our code is available at aispeech-lab/LiMuSE. Feel free to contact us if you have any questions or suggestions.