Search Results for author: Herman Kamper

Found 72 papers, 35 papers with code

Deep convolutional acoustic word embeddings using word-pair side information

1 code implementation • 5 Oct 2015 • Herman Kamper, Weiran Wang, Karen Livescu

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units.

speech-recognition Speech Recognition +1

Paper
Code

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

no code implementations • 9 Mar 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater

In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text.

Language Acquisition Language Modelling +1

Paper
Add Code

A segmental framework for fully-unsupervised large-vocabulary speech recognition

5 code implementations • 22 Jun 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater

We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding).

Language Modelling Speech Recognition +1

Paper
Code

Weakly supervised spoken term discovery using cross-lingual side information

no code implementations • 21 Sep 2016 • Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez

Recent work on unsupervised term discovery (UTD) aims to identify and cluster repeated word-like units from audio alone.

Paper
Add Code

Unsupervised neural and Bayesian models for zero-resource speech processing

no code implementations • 3 Jan 2017 • Herman Kamper

Finally, we show that the clusters discovered by the segmental Bayesian model can be made less speaker- and gender-specific by using features from the cAE instead of traditional acoustic features.

Clustering Language Modelling +1

Paper
Add Code

Towards speech-to-text translation without speech recognition

no code implementations • EACL 2017 • Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

We explore the problem of translating speech to text in low-resource scenarios where neither automatic speech recognition (ASR) nor machine translation (MT) are available, but we have training data in the form of audio paired with text translations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

An embedded segmental K-means model for unsupervised segmentation and clustering of speech

2 code implementations • 23 Mar 2017 • Herman Kamper, Karen Livescu, Sharon Goldwater

Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing.

Bayesian Inference Clustering +2

Paper
Code

Visually grounded learning of keyword prediction from untranscribed speech

1 code implementation • 23 Mar 2017 • Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu

In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech.

Language Acquisition TAG

Paper
Code

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

1 code implementation • 12 Jun 2017 • Shane Settle, Keith Levin, Herman Kamper, Karen Livescu

Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments.

Dynamic Time Warping Word Embeddings

Paper
Code

Semantic speech retrieval with a visually grounded model of untranscribed speech

2 code implementations • 5 Oct 2017 • Herman Kamper, Gregory Shakhnarovich, Karen Livescu

We introduce a newly collected data set of human semantic relevance judgements and an associated task, semantic speech retrieval, where the goal is to search for spoken utterances that are semantically relevant to a given text query.

Language Acquisition Retrieval

Paper
Code

Low-Resource Speech-to-Text Translation

no code implementations • 24 Mar 2018 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

We explore models trained on between 20 and 160 hours of data, and find that although models trained on less data have considerably lower BLEU scores, they can still predict words with relatively high precision and recall---around 50% for a model trained on 50 hours of data, versus around 60% for the full 160 hour model.

Machine Translation speech-recognition +3

Paper
Add Code

Visually grounded cross-lingual keyword spotting in speech

no code implementations • 13 Jun 2018 • Herman Kamper, Michael Roth

Recent work considered how images paired with speech can be used as supervision for building speech systems when transcriptions are not available.

Keyword Spotting Visual Grounding

Paper
Add Code

Learning Dynamics of Linear Denoising Autoencoders

1 code implementation • ICML 2018 • Arnu Pretorius, Steve Kroon, Herman Kamper

Here we develop theory for how noise influences learning in DAEs.

Denoising Representation Learning

Paper
Code

Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring

no code implementations • 25 Jun 2018 • Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler

While the resulting CNN keyword spotter cannot match the performance of the DTW-based system, it substantially outperforms a CNN classifier trained only on the keywords, improving the area under the ROC curve from 0. 54 to 0. 64.

Dynamic Time Warping Humanitarian +2

Paper
Add Code

ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages

no code implementations • 23 Jul 2018 • Raghav Menon, Herman Kamper, Emre Yilmaz, John Quinn, Thomas Niesler

We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting.

Dynamic Time Warping Humanitarian +2

Paper
Add Code

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

1 code implementation • NAACL 2019 • Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

2 code implementations • 1 Nov 2018 • Herman Kamper

We investigate unsupervised models that can map a variable-duration speech segment to a fixed-dimensional representation.

Word Embeddings

Paper
Code

Critical initialisation for deep signal propagation in noisy rectifier neural networks

1 code implementation • NeurIPS 2018 • Arnu Pretorius, Elan van Biljon, Steve Kroon, Herman Kamper

Simulations and experiments on real-world data confirm that our proposed initialisation is able to stably propagate signals in deep networks, while using an initialisation disregarding noise fails to do so.

Paper
Code

Multimodal One-Shot Learning of Speech and Images

2 code implementations • 9 Nov 2018 • Ryan Eloff, Herman A. Engelbrecht, Herman Kamper

Imagine a robot is shown new concepts visually together with spoken tags, e. g. "milk", "eggs", "butter".

Dynamic Time Warping One-Shot Learning

Paper
Code

Multilingual and Unsupervised Subword Modeling for Zero-Resource Languages

1 code implementation • 9 Nov 2018 • Enno Hermann, Herman Kamper, Sharon Goldwater

Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task.

Clustering

Paper
Code

Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders

no code implementations • 14 Nov 2018 • Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John Quinn, Thomas Niesler

We compare features for dynamic time warping (DTW) when used to bootstrap keyword spotting (KWS) in an almost zero-resource setting.

Dynamic Time Warping Humanitarian +1

Paper
Add Code

Semantic query-by-example speech search using visual grounding

1 code implementation • 15 Apr 2019 • Herman Kamper, Aristotelis Anastassiou, Karen Livescu

A number of recent studies have started to investigate how speech systems can be trained on untranscribed speech by leveraging accompanying images at training time.

Retrieval Semantic Retrieval +1

Paper
Code

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

no code implementations • 16 Apr 2019 • Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper

For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis.

Acoustic Unit Discovery Speech Synthesis

Paper
Add Code

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval

no code implementations • 24 Apr 2019 • Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu

Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.

Retrieval Visual Grounding

Paper
Add Code

Cross-lingual topic prediction for speech using translations

no code implementations • 29 Aug 2019 • Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic?

Humanitarian Speech-to-Text Translation +1

Paper
Add Code

On the expected behaviour of noise regularised deep neural networks as Gaussian processes

no code implementations • 12 Oct 2019 • Arnu Pretorius, Herman Kamper, Steve Kroon

Recent work has established the equivalence between deep neural networks and Gaussian processes (GPs), resulting in so-called neural network Gaussian processes (NNGPs).

Gaussian Processes

Paper
Add Code

If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks

no code implementations • 13 Oct 2019 • Arnu Pretorius, Elan van Biljon, Benjamin van Niekerk, Ryan Eloff, Matthew Reynard, Steve James, Benjamin Rosman, Herman Kamper, Steve Kroon

Our results therefore suggest that, in the shallow-to-moderate depth setting, critical initialisation provides zero performance gains when compared to off-critical initialisations and that searching for off-critical initialisations that might improve training speed or generalisation, is likely to be a fruitless endeavour.

Paper
Add Code

BINet: a binary inpainting network for deep patch-based image compression

1 code implementation • 11 Dec 2019 • André Nortje, Willie Brink, Herman A. Engelbrecht, Herman Kamper

We propose the Binary Inpainting Network (BINet), an autoencoder framework which incorporates binary inpainting to reinstate interdependencies between adjacent patches, for improved patch-based compression of still images.

Image Compression

Paper
Code

Deep motion estimation for parallel inter-frame prediction in video compression

1 code implementation • 11 Dec 2019 • André Nortje, Herman A. Engelbrecht, Herman Kamper

Standard video codecs rely on optical flow to guide inter-frame prediction: pixels from reference frames are moved via motion vectors to predict target video frames.

Motion Estimation Optical Flow Estimation +1

Paper
Code

Multilingual acoustic word embedding models for processing zero-resource languages

1 code implementation • 6 Feb 2020 • Herman Kamper, Yevgen Matusevych, Sharon Goldwater

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments.

Transfer Learning Word Embeddings

Paper
Code

Masakhane -- Machine Translation For Africa

2 code implementations • 13 Mar 2020 • Iroro Orife, Julia Kreutzer, Blessing Sibanda, Daniel Whitenack, Kathleen Siminyu, Laura Martinus, Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Salomon Kabongo, Musie Meressa, Espoir Murhabazi, Orevaoghene Ahia, Elan van Biljon, Arshath Ramkilowan, Adewale Akinfaderin, Alp Öktem, Wole Akin, Ghollah Kioko, Kevin Degila, Herman Kamper, Bonaventure Dossou, Chris Emezue, Kelechi Ogueji, Abdallah Bashir

Africa has over 2000 languages.

Machine Translation Translation

268

Paper
Code

Unsupervised feature learning for speech using correspondence and Siamese networks

no code implementations • 28 Mar 2020 • Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper

Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models.

Paper
Add Code

Analyzing autoencoder-based acoustic word embeddings

no code implementations • 3 Apr 2020 • Yevgen Matusevych, Herman Kamper, Sharon Goldwater

To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs.

Word Embeddings

Paper
Add Code

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

2 code implementations • 19 May 2020 • Benjamin van Niekerk, Leanne Nortje, Herman Kamper

The idea is to learn a representation of speech by predicting future acoustic units.

Ranked #1 on Voice Conversion on ZeroSpeech 2019 English (using extra training data)

Acoustic Unit Discovery Voice Conversion

309

Paper
Code

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

1 code implementation • 2 Jun 2020 • Herman Kamper, Yevgen Matusevych, Sharon Goldwater

We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.

speech-recognition Speech Recognition +1

Paper
Code

Evaluating computational models of infant phonetic learning across languages

no code implementations • 6 Aug 2020 • Yevgen Matusevych, Thomas Schatz, Herman Kamper, Naomi H. Feldman, Sharon Goldwater

In the first year of life, infants' speech perception becomes attuned to the sounds of their native language.

Paper
Add Code

Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images

1 code implementation • 14 Aug 2020 • Leanne Nortje, Herman Kamper

Here we compare transfer learning to unsupervised models trained on unlabelled in-domain data.

Transfer Learning

Paper
Code

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

4 code implementations • Findings of the Association for Computational Linguistics 2020 • Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkabir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Espoir Murhabazi, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Emezue, Bonaventure Dossou, Blessing Sibanda, Blessing Itoro Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, Abdallah Bashir

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved.

Machine Translation Translation

656

Paper
Code

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

no code implementations • 3 Dec 2020 • Puyuan Peng, Herman Kamper, Karen Livescu

We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.

Word Embeddings

Paper
Add Code

Direct multimodal few-shot learning of speech and images

1 code implementation • 10 Dec 2020 • Leanne Nortje, Herman Kamper

We propose direct multimodal few-shot models that learn a shared embedding space of spoken words and images from only a few paired examples.

Few-Shot Learning Transfer Learning

Paper
Code

Towards localisation of keywords in speech using weak supervision

no code implementations • 14 Dec 2020 • Kayode Olaleye, Benjamin van Niekerk, Herman Kamper

Of the two forms of supervision, the visually trained model performs worse than the BoW-trained model.

Paper
Add Code

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

no code implementations • 14 Dec 2020 • Herman Kamper, Benjamin van Niekerk

We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units.

Clustering Segmentation

Paper
Add Code

A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings

no code implementations • 14 Dec 2020 • Lisa van Staden, Herman Kamper

We compare frame-level features from contrastive predictive coding (CPC), autoregressive predictive coding and a CAE to conventional MFCCs.

Representation Learning Word Embeddings

Paper
Add Code

A phonetic model of non-native spoken word processing

no code implementations • EACL 2021 • Yevgen Matusevych, Herman Kamper, Thomas Schatz, Naomi H. Feldman, Sharon Goldwater

We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers.

Attribute

Paper
Add Code

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

2 code implementations • 19 Mar 2021 • Christiaan Jacobs, Yevgen Matusevych, Herman Kamper

We consider how a recent contrastive learning loss can be used in both the purely unsupervised and multilingual transfer settings.

Contrastive Learning Word Embeddings

Paper
Code

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts

no code implementations • 31 May 2021 • Matthew Baas, Herman Kamper

We specifically extend the recent StarGAN-VC model by conditioning it on a speaker embedding (from a potentially unseen speaker).

Voice Conversion

Paper
Add Code

Attention-Based Keyword Localisation in Speech using Visual Grounding

no code implementations • 16 Jun 2021 • Kayode Olaleye, Herman Kamper

Visually grounded speech models learn from images paired with spoken captions.

Visual Grounding

Paper
Add Code

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

2 code implementations • 24 Jun 2021 • Christiaan Jacobs, Herman Kamper

Through finer-grained analysis, we show that training on even just a single related language gives the largest gain.

Transfer Learning Word Embeddings

Paper
Code

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

1 code implementation • 2 Aug 2021 • Benjamin van Niekerk, Leanne Nortje, Matthew Baas, Herman Kamper

In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent.

Acoustic Unit Discovery Language Modelling +1

Paper
Code

Feature learning for efficient ASR-free keyword spotting in low-resource languages

no code implementations • 13 Aug 2021 • Ewald van der Westhuizen, Herman Kamper, Raghav Menon, John Quinn, Thomas Niesler

We show that, using these features, the CNN-DTW keyword spotter performs almost as well as the DTW keyword spotter while outperforming a baseline CNN trained only on the keyword templates.

Dynamic Time Warping Humanitarian +1

Paper
Add Code

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

2 code implementations • 3 Nov 2021 • Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Mathew Baas, Hugo Seuté, Herman Kamper

Specifically, we compare discrete and soft speech units as input features.

Representation Learning Voice Conversion

375

Paper
Code

Voice Conversion Can Improve ASR in Very Low-Resource Settings

no code implementations • 4 Nov 2021 • Matthew Baas, Herman Kamper

In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition.

Data Augmentation speech-recognition +2

Paper
Add Code

Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel

no code implementations • 4 Nov 2021 • Kevin Eloff, Okko Räsänen, Herman A. Engelbrecht, Arnu Pretorius, Herman Kamper

Multi-agent reinforcement learning has been used as an effective means to study emergent communication between agents, yet little focus has been given to continuous acoustic communication.

Language Acquisition Multi-agent Reinforcement Learning +3

Paper
Add Code

Keyword localisation in untranscribed speech using visually grounded speech models

1 code implementation • 2 Feb 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper

Masked-based localisation gives some of the best reported localisation scores from a VGS model, with an accuracy of 57% when the system knows that a keyword occurs in an utterance and need to predict its location.

Keyword Spotting TAG

Paper
Code

Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring

3 code implementations • 24 Feb 2022 • Herman Kamper

This paper instead revisits an older approach to word segmentation: bottom-up phone-like unit discovery is performed first, and symbolic word segmentation is then performed on top of the discovered units (without influencing the lower level).

Acoustic Unit Discovery Segmentation

Paper
Code

A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery

no code implementations • 23 Jun 2022 • Werner van der Merwe, Herman Kamper, Johan du Preez

In this paper, we present an extension to LDA that uses a Markov chain to model temporal information.

Acoustic Unit Discovery

Paper
Add Code

YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding

no code implementations • 10 Oct 2022 • Kayode Olaleye, Dan Oneata, Herman Kamper

We collect and release a new single-speaker dataset of audio captions for 6k Flickr images in Yor\`ub\'a -- a real low-resource language spoken in Nigeria.

Visual Grounding

Paper
Add Code

GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models

1 code implementation • 11 Oct 2022 • Matthew Baas, Herman Kamper

As in the StyleGAN family of image synthesis models, ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer.

Disentanglement Generative Adversarial Network +2

Paper
Code

Towards visually prompted keyword localisation for zero-resource spoken languages

1 code implementation • 12 Oct 2022 • Leanne Nortje, Herman Kamper

We formalise this task and call it visually prompted keyword localisation (VPKL): given an image of a keyword, detect and predict where in an utterance the keyword occurs.

Paper
Code

TransFusion: Transcribing Speech with Multinomial Diffusion

1 code implementation • 14 Oct 2022 • Matthew Baas, Kevin Eloff, Herman Kamper

In this work we aim to see whether the benefits of diffusion models can also be realized for speech recognition.

Denoising Image Generation +3

Paper
Code

Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning

1 code implementation • 22 May 2023 • Ruan van der Merwe, Herman Kamper

We consider the problem of few-shot spoken word classification in a setting where a model is incrementally introduced to new word classes.

Continual Learning Meta-Learning

Paper
Code

Visually grounded few-shot word acquisition with fewer shots

no code implementations • 25 May 2023 • Leanne Nortje, Benjamin van Niekerk, Herman Kamper

Our approach involves using the given word-image example pairs to mine new unsupervised word-image training pairs from large collections of unlabelled speech and images.

Paper
Add Code

Voice Conversion With Just Nearest Neighbors

1 code implementation • 30 May 2023 • Matthew Baas, Benjamin van Niekerk, Herman Kamper

Any-to-any voice conversion aims to transform source speech into a target voice with just a few examples of the target speaker as a reference.

Ranked #1 on Voice Conversion on LibriSpeech test-clean (using extra training data)

Voice Conversion

407

Paper
Code

Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili

no code implementations • 1 Jun 2023 • Christiaan Jacobs, Nathanaël Carraz Rakotonirina, Everlyn Asiko Chimoto, Bruce A. Bassett, Herman Kamper

But in an in-the-wild test on Swahili radio broadcasts with actual hate speech keywords, the AWE model (using one minute of template data) is more robust, giving similar performance to an ASR system trained on 30 hours of labelled data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Visually grounded few-shot word learning in low-resource settings

no code implementations • 20 Jun 2023 • Leanne Nortje, Dan Oneata, Herman Kamper

We propose an approach that can work on natural word-image pairs but with less examples, i. e. fewer shots, and then illustrate how this approach can be applied for multimodal few-shot learning in a real low-resource language, Yor\`ub\'a.

Few-Shot Learning

Paper
Add Code

Disentanglement in a GAN for Unconditional Speech Synthesis

1 code implementation • 4 Jul 2023 • Matthew Baas, Herman Kamper

We confirm that ASGAN's latent space is disentangled: we demonstrate how simple linear operations in the space can be used to perform several tasks unseen during training.

Disentanglement Generative Adversarial Network +5

Paper
Code

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

no code implementations • 5 Jul 2023 • Christiaan Jacobs, Herman Kamper

Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings.

Word Embeddings Word Similarity

Paper
Add Code

Rhythm Modeling for Voice Conversion

1 code implementation • 12 Jul 2023 • Benjamin van Niekerk, Marc-André Carbonneau, Herman Kamper

Voice conversion aims to transform source speech into a different target voice.

Voice Conversion

Paper
Code

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices

no code implementations • 12 Oct 2023 • Matthew Baas, Herman Kamper

Nevertheless, this shows that voice conversion models - and kNN-VC in particular - are increasingly applicable in a range of non-standard downstream tasks.

Voice Conversion

Paper
Add Code

Revisiting speech segmentation and lexicon learning with better features

no code implementations • 31 Jan 2024 • Herman Kamper, Benjamin van Niekerk

We revisit a self-supervised method that segments unlabelled speech into word-like segments.

Acoustic Unit Discovery Segmentation

Paper
Add Code

Visually Grounded Speech Models have a Mutual Exclusivity Bias

no code implementations • 20 Mar 2024 • Leanne Nortje, Dan Oneaţă, Yevgen Matusevych, Herman Kamper

To simulate prior acoustic and visual knowledge, we experiment with several initialisation strategies using pretrained speech and vision networks.

Paper
Add Code

LiSTra Automatic Speech Translation: English to Lingala Case Study

no code implementations • DCLRL (LREC) 2022 • Salomon Kabongo Kabenamualu, Vukosi Marivate, Herman Kamper

In recent years there has been great interest in addressing the data scarcity of African languages and providing baseline models for different Natural Language Processing tasks (Orife et al., 2020).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.