1 code implementation • 9 Sep 2024 • Leanne Nortje, Dan Oneata, Herman Kamper
On English, this results in only a small drop in performance.
no code implementations • 3 Sep 2024 • Leanne Nortje
This dissertation examines visually grounded speech (VGS) models that learn from unlabelled speech paired with images.
no code implementations • 20 Mar 2024 • Leanne Nortje, Dan Oneaţă, Yevgen Matusevych, Herman Kamper
To simulate prior acoustic and visual knowledge, we experiment with several initialisation strategies using pretrained speech and vision networks.
no code implementations • 20 Jun 2023 • Leanne Nortje, Dan Oneata, Herman Kamper
We propose an approach that can work on natural word-image pairs but with less examples, i. e. fewer shots, and then illustrate how this approach can be applied for multimodal few-shot learning in a real low-resource language, Yor\`ub\'a.
no code implementations • 25 May 2023 • Leanne Nortje, Benjamin van Niekerk, Herman Kamper
Our approach involves using the given word-image example pairs to mine new unsupervised word-image training pairs from large collections of unlabelled speech and images.
1 code implementation • 12 Oct 2022 • Leanne Nortje, Herman Kamper
We formalise this task and call it visually prompted keyword localisation (VPKL): given an image of a keyword, detect and predict where in an utterance the keyword occurs.
1 code implementation • 2 Aug 2021 • Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper
In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent.
1 code implementation • 10 Dec 2020 • Leanne Nortje, Herman Kamper
We propose direct multimodal few-shot models that learn a shared embedding space of spoken words and images from only a few paired examples.
1 code implementation • 14 Aug 2020 • Leanne Nortje, Herman Kamper
Here we compare transfer learning to unsupervised models trained on unlabelled in-domain data.
2 code implementations • 19 May 2020 • Benjamin van Niekerk, Leanne Nortje, Herman Kamper
The idea is to learn a representation of speech by predicting future acoustic units.
Ranked #1 on
Voice Conversion
on ZeroSpeech 2019 English
(using extra training data)
no code implementations • 16 Apr 2019 • Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper
For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis.