no code implementations • 22 Nov 2023 • Duowei Tang, Peter Kuppens, Luc Geurts, Toon van Waterschoot
We use the wav2vec 2. 0 pre-trained model to transform audio time-domain waveforms from different languages, different speakers and different recording conditions into a feature space shared by multiple languages, thereby it reduces the language variabilities in the speech features.