no code implementations • DCLRL (LREC) 2022 • Salomon Kabongo Kabenamualu, Vukosi Marivate, Herman Kamper
In recent years there has been great interest in addressing the data scarcity of African languages and providing baseline models for different Natural Language Processing tasks (Orife et al., 2020).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 14 Oct 2022 • Matthew Baas, Kevin Eloff, Herman Kamper
In this work we aim to see whether the benefits of diffusion models can also be realized for speech recognition.
1 code implementation • 12 Oct 2022 • Leanne Nortje, Herman Kamper
We formalise this task and call it visually prompted keyword localisation (VPKL): given an image of a keyword, detect and predict where in an utterance the keyword occurs.
1 code implementation • 11 Oct 2022 • Matthew Baas, Herman Kamper
As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer.
no code implementations • 10 Oct 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper
We collect and release a new single-speaker dataset of audio captions for 6k Flickr images in Yor\`ub\'a -- a real low-resource language spoken in Nigeria.
no code implementations • 23 Jun 2022 • Werner van der Merwe, Herman Kamper, Johan du Preez
In this paper, we present an extension to LDA that uses a Markov chain to model temporal information.
3 code implementations • 24 Feb 2022 • Herman Kamper
This paper instead revisits an older approach to word segmentation: bottom-up phone-like unit discovery is performed first, and symbolic word segmentation is then performed on top of the discovered units (without influencing the lower level).
1 code implementation • 2 Feb 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper
Masked-based localisation gives some of the best reported localisation scores from a VGS model, with an accuracy of 57% when the system knows that a keyword occurs in an utterance and need to predict its location.
no code implementations • 4 Nov 2021 • Matthew Baas, Herman Kamper
In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition.
no code implementations • 4 Nov 2021 • Kevin Eloff, Arnu Pretorius, Okko Räsänen, Herman A. Engelbrecht, Herman Kamper
The Speaker is equipped with a vocoder that maps symbols to a continuous waveform, this is passed over a lossy continuous channel, and the Listener needs to map the continuous signal to the concept.
1 code implementation • 3 Nov 2021 • Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Mathew Baas, Hugo Seuté, Herman Kamper
Specifically, we compare discrete and soft speech units as input features.
no code implementations • 13 Aug 2021 • Ewald van der Westhuizen, Herman Kamper, Raghav Menon, John Quinn, Thomas Niesler
We show that, using these features, the CNN-DTW keyword spotter performs almost as well as the DTW keyword spotter while outperforming a baseline CNN trained only on the keyword templates.
1 code implementation • 2 Aug 2021 • Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper
In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent.
1 code implementation • 3 Jul 2021 • Arnu Pretorius, Kale-ab Tessera, Andries P. Smit, Claude Formanek, St John Grimbly, Kevin Eloff, Siphelele Danisa, Lawrence Francis, Jonathan Shock, Herman Kamper, Willie Brink, Herman Engelbrecht, Alexandre Laterre, Karim Beguir
We provide experimental results for these implementations on a wide range of multi-agent environments and highlight the benefits of distributed system training.
2 code implementations • 24 Jun 2021 • Christiaan Jacobs, Herman Kamper
Through finer-grained analysis, we show that training on even just a single related language gives the largest gain.
no code implementations • 16 Jun 2021 • Kayode Olaleye, Herman Kamper
Visually grounded speech models learn from images paired with spoken captions.
no code implementations • 31 May 2021 • Matthew Baas, Herman Kamper
We specifically extend the recent StarGAN-VC model by conditioning it on a speaker embedding (from a potentially unseen speaker).
2 code implementations • 19 Mar 2021 • Christiaan Jacobs, Yevgen Matusevych, Herman Kamper
We consider how a recent contrastive learning loss can be used in both the purely unsupervised and multilingual transfer settings.
no code implementations • EACL 2021 • Yevgen Matusevych, Herman Kamper, Thomas Schatz, Naomi H. Feldman, Sharon Goldwater
We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers.
no code implementations • 14 Dec 2020 • Lisa van Staden, Herman Kamper
We compare frame-level features from contrastive predictive coding (CPC), autoregressive predictive coding and a CAE to conventional MFCCs.
no code implementations • 14 Dec 2020 • Kayode Olaleye, Benjamin van Niekerk, Herman Kamper
Of the two forms of supervision, the visually trained model performs worse than the BoW-trained model.
no code implementations • 14 Dec 2020 • Herman Kamper, Benjamin van Niekerk
We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units.
1 code implementation • 10 Dec 2020 • Leanne Nortje, Herman Kamper
We propose direct multimodal few-shot models that learn a shared embedding space of spoken words and images from only a few paired examples.
no code implementations • 3 Dec 2020 • Puyuan Peng, Herman Kamper, Karen Livescu
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.
4 code implementations • Findings of the Association for Computational Linguistics 2020 • Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkabir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Espoir Murhabazi, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Emezue, Bonaventure Dossou, Blessing Sibanda, Blessing Itoro Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, Abdallah Bashir
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved.
1 code implementation • 14 Aug 2020 • Leanne Nortje, Herman Kamper
Here we compare transfer learning to unsupervised models trained on unlabelled in-domain data.
no code implementations • 6 Aug 2020 • Yevgen Matusevych, Thomas Schatz, Herman Kamper, Naomi H. Feldman, Sharon Goldwater
In the first year of life, infants' speech perception becomes attuned to the sounds of their native language.
1 code implementation • 2 Jun 2020 • Herman Kamper, Yevgen Matusevych, Sharon Goldwater
We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.
2 code implementations • 19 May 2020 • Benjamin van Niekerk, Leanne Nortje, Herman Kamper
The idea is to learn a representation of speech by predicting future acoustic units.
Ranked #1 on
Acoustic Unit Discovery
on ZeroSpeech 2019 English
no code implementations • 3 Apr 2020 • Yevgen Matusevych, Herman Kamper, Sharon Goldwater
To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs.
no code implementations • 28 Mar 2020 • Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper
Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models.
2 code implementations • 13 Mar 2020 • Iroro Orife, Julia Kreutzer, Blessing Sibanda, Daniel Whitenack, Kathleen Siminyu, Laura Martinus, Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Salomon Kabongo, Musie Meressa, Espoir Murhabazi, Orevaoghene Ahia, Elan van Biljon, Arshath Ramkilowan, Adewale Akinfaderin, Alp Öktem, Wole Akin, Ghollah Kioko, Kevin Degila, Herman Kamper, Bonaventure Dossou, Chris Emezue, Kelechi Ogueji, Abdallah Bashir
Africa has over 2000 languages.
1 code implementation • 6 Feb 2020 • Herman Kamper, Yevgen Matusevych, Sharon Goldwater
Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments.
1 code implementation • 11 Dec 2019 • André Nortje, Willie Brink, Herman A. Engelbrecht, Herman Kamper
We propose the Binary Inpainting Network (BINet), an autoencoder framework which incorporates binary inpainting to reinstate interdependencies between adjacent patches, for improved patch-based compression of still images.
1 code implementation • 11 Dec 2019 • André Nortje, Herman A. Engelbrecht, Herman Kamper
Standard video codecs rely on optical flow to guide inter-frame prediction: pixels from reference frames are moved via motion vectors to predict target video frames.
no code implementations • 13 Oct 2019 • Arnu Pretorius, Elan van Biljon, Benjamin van Niekerk, Ryan Eloff, Matthew Reynard, Steve James, Benjamin Rosman, Herman Kamper, Steve Kroon
Our results therefore suggest that, in the shallow-to-moderate depth setting, critical initialisation provides zero performance gains when compared to off-critical initialisations and that searching for off-critical initialisations that might improve training speed or generalisation, is likely to be a fruitless endeavour.
no code implementations • 12 Oct 2019 • Arnu Pretorius, Herman Kamper, Steve Kroon
Recent work has established the equivalence between deep neural networks and Gaussian processes (GPs), resulting in so-called neural network Gaussian processes (NNGPs).
no code implementations • 29 Aug 2019 • Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater
Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic?
no code implementations • 24 Apr 2019 • Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu
Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.
no code implementations • 16 Apr 2019 • Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper
For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis.
1 code implementation • 15 Apr 2019 • Herman Kamper, Aristotelis Anastassiou, Karen Livescu
A number of recent studies have started to investigate how speech systems can be trained on untranscribed speech by leveraging accompanying images at training time.
no code implementations • 14 Nov 2018 • Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John Quinn, Thomas Niesler
We compare features for dynamic time warping (DTW) when used to bootstrap keyword spotting (KWS) in an almost zero-resource setting.
1 code implementation • 9 Nov 2018 • Enno Hermann, Herman Kamper, Sharon Goldwater
Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task.
2 code implementations • 9 Nov 2018 • Ryan Eloff, Herman A. Engelbrecht, Herman Kamper
Imagine a robot is shown new concepts visually together with spoken tags, e. g. "milk", "eggs", "butter".
1 code implementation • NeurIPS 2018 • Arnu Pretorius, Elan van Biljon, Steve Kroon, Herman Kamper
Simulations and experiments on real-world data confirm that our proposed initialisation is able to stably propagate signals in deep networks, while using an initialisation disregarding noise fails to do so.
2 code implementations • 1 Nov 2018 • Herman Kamper
We investigate unsupervised models that can map a variable-duration speech segment to a fixed-dimensional representation.
1 code implementation • NAACL 2019 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater
Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 23 Jul 2018 • Raghav Menon, Herman Kamper, Emre Yilmaz, John Quinn, Thomas Niesler
We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting.
no code implementations • 25 Jun 2018 • Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler
While the resulting CNN keyword spotter cannot match the performance of the DTW-based system, it substantially outperforms a CNN classifier trained only on the keywords, improving the area under the ROC curve from 0. 54 to 0. 64.
1 code implementation • ICML 2018 • Arnu Pretorius, Steve Kroon, Herman Kamper
Here we develop theory for how noise influences learning in DAEs.
no code implementations • 13 Jun 2018 • Herman Kamper, Michael Roth
Recent work considered how images paired with speech can be used as supervision for building speech systems when transcriptions are not available.
no code implementations • 24 Mar 2018 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater
We explore models trained on between 20 and 160 hours of data, and find that although models trained on less data have considerably lower BLEU scores, they can still predict words with relatively high precision and recall---around 50% for a model trained on 50 hours of data, versus around 60% for the full 160 hour model.
2 code implementations • 5 Oct 2017 • Herman Kamper, Gregory Shakhnarovich, Karen Livescu
We introduce a newly collected data set of human semantic relevance judgements and an associated task, semantic speech retrieval, where the goal is to search for spoken utterances that are semantically relevant to a given text query.
1 code implementation • 12 Jun 2017 • Shane Settle, Keith Levin, Herman Kamper, Karen Livescu
Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments.
2 code implementations • 23 Mar 2017 • Herman Kamper, Karen Livescu, Sharon Goldwater
Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing.
1 code implementation • 23 Mar 2017 • Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech.
no code implementations • EACL 2017 • Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater
We explore the problem of translating speech to text in low-resource scenarios where neither automatic speech recognition (ASR) nor machine translation (MT) are available, but we have training data in the form of audio paired with text translations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 3 Jan 2017 • Herman Kamper
Finally, we show that the clusters discovered by the segmental Bayesian model can be made less speaker- and gender-specific by using features from the cAE instead of traditional acoustic features.
no code implementations • 21 Sep 2016 • Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez
Recent work on unsupervised term discovery (UTD) aims to identify and cluster repeated word-like units from audio alone.
5 code implementations • 22 Jun 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater
We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding).
no code implementations • 9 Mar 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater
In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text.
1 code implementation • 5 Oct 2015 • Herman Kamper, Weiran Wang, Karen Livescu
Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units.