no code implementations • 31 Jan 2024 • Herman Kamper, Benjamin van Niekerk
We revisit a self-supervised method that segments unlabelled speech into word-like segments.
1 code implementation • 12 Jul 2023 • Benjamin van Niekerk, Marc-André Carbonneau, Herman Kamper
Voice conversion aims to transform source speech into a different target voice.
1 code implementation • 30 May 2023 • Matthew Baas, Benjamin van Niekerk, Herman Kamper
Any-to-any voice conversion aims to transform source speech into a target voice with just a few examples of the target speaker as a reference.
Ranked #1 on Voice Conversion on LibriSpeech test-clean (using extra training data)
no code implementations • 25 May 2023 • Leanne Nortje, Benjamin van Niekerk, Herman Kamper
Our approach involves using the given word-image example pairs to mine new unsupervised word-image training pairs from large collections of unlabelled speech and images.
2 code implementations • 3 Nov 2021 • Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Mathew Baas, Hugo Seuté, Herman Kamper
Specifically, we compare discrete and soft speech units as input features.
1 code implementation • 2 Aug 2021 • Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper
In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent.
no code implementations • 14 Dec 2020 • Kayode Olaleye, Benjamin van Niekerk, Herman Kamper
Of the two forms of supervision, the visually trained model performs worse than the BoW-trained model.
no code implementations • 14 Dec 2020 • Herman Kamper, Benjamin van Niekerk
We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units.
2 code implementations • 19 May 2020 • Benjamin van Niekerk, Leanne Nortje, Herman Kamper
The idea is to learn a representation of speech by predicting future acoustic units.
Ranked #1 on Voice Conversion on ZeroSpeech 2019 English (using extra training data)
no code implementations • 7 Apr 2020 • Benjamin van Niekerk, Andreas Damianou, Benjamin Rosman
The environment's dynamics are learned from limited training data and can be reused in new task instances without retraining.
no code implementations • 13 Oct 2019 • Arnu Pretorius, Elan van Biljon, Benjamin van Niekerk, Ryan Eloff, Matthew Reynard, Steve James, Benjamin Rosman, Herman Kamper, Steve Kroon
Our results therefore suggest that, in the shallow-to-moderate depth setting, critical initialisation provides zero performance gains when compared to off-critical initialisations and that searching for off-critical initialisations that might improve training speed or generalisation, is likely to be a fruitless endeavour.
no code implementations • 16 Apr 2019 • Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper
For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis.
no code implementations • 12 Jul 2018 • Benjamin van Niekerk, Steven James, Adam Earle, Benjamin Rosman
An important property for lifelong-learning agents is the ability to combine existing skills to solve unseen tasks.