no code implementations • 12 Oct 2023 • Matthew Baas, Herman Kamper
Nevertheless, this shows that voice conversion models - and kNN-VC in particular - are increasingly applicable in a range of non-standard downstream tasks.
1 code implementation • 4 Jul 2023 • Matthew Baas, Herman Kamper
We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training.
1 code implementation • 30 May 2023 • Matthew Baas, Benjamin van Niekerk, Herman Kamper
Any-to-any voice conversion aims to transform source speech into a target voice with just a few examples of the target speaker as a reference.
Ranked #1 on Voice Conversion on LibriSpeech test-clean (using extra training data)
1 code implementation • 14 Oct 2022 • Matthew Baas, Kevin Eloff, Herman Kamper
In this work we aim to see whether the benefits of diffusion models can also be realized for speech recognition.
1 code implementation • 11 Oct 2022 • Matthew Baas, Herman Kamper
As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer.
no code implementations • 4 Nov 2021 • Matthew Baas, Herman Kamper
In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition.
1 code implementation • 2 Aug 2021 • Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper
In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent.
no code implementations • 31 May 2021 • Matthew Baas, Herman Kamper
We specifically extend the recent StarGAN-VC model by conditioning it on a speaker embedding (from a potentially unseen speaker).