Search Results for author: Herman Kamper

Found 54 papers, 25 papers with code

Voice Conversion Can Improve ASR in Very Low-Resource Settings

no code implementations4 Nov 2021 Matthew Baas, Herman Kamper

In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition.

Data Augmentation Speech Recognition +1

Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel

no code implementations4 Nov 2021 Kevin Eloff, Arnu Pretorius, Okko Räsänen, Herman A. Engelbrecht, Herman Kamper

The Speaker is equipped with a vocoder that maps symbols to a continuous waveform, this is passed over a lossy continuous channel, and the Listener needs to map the continuous signal to the concept.

Multi-agent Reinforcement Learning Q-Learning

Feature learning for efficient ASR-free keyword spotting in low-resource languages

no code implementations13 Aug 2021 Ewald van der Westhuizen, Herman Kamper, Raghav Menon, John Quinn, Thomas Niesler

We show that, using these features, the CNN-DTW keyword spotter performs almost as well as the DTW keyword spotter while outperforming a baseline CNN trained only on the keyword templates.

Dynamic Time Warping Keyword Spotting

Attention-Based Keyword Localisation in Speech using Visual Grounding

no code implementations16 Jun 2021 Kayode Olaleye, Herman Kamper

Visually grounded speech models learn from images paired with spoken captions.

Visual Grounding

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts

no code implementations31 May 2021 Matthew Baas, Herman Kamper

We specifically extend the recent StarGAN-VC model by conditioning it on a speaker embedding (from a potentially unseen speaker).

Voice Conversion

A phonetic model of non-native spoken word processing

no code implementations EACL 2021 Yevgen Matusevych, Herman Kamper, Thomas Schatz, Naomi H. Feldman, Sharon Goldwater

We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers.

A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings

no code implementations14 Dec 2020 Lisa van Staden, Herman Kamper

We compare frame-level features from contrastive predictive coding (CPC), autoregressive predictive coding and a CAE to conventional MFCCs.

Word Embeddings

Towards localisation of keywords in speech using weak supervision

no code implementations14 Dec 2020 Kayode Olaleye, Benjamin van Niekerk, Herman Kamper

Of the two forms of supervision, the visually trained model performs worse than the BoW-trained model.

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

no code implementations14 Dec 2020 Herman Kamper, Benjamin van Niekerk

We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units.

Direct multimodal few-shot learning of speech and images

1 code implementation10 Dec 2020 Leanne Nortje, Herman Kamper

We propose direct multimodal few-shot models that learn a shared embedding space of spoken words and images from only a few paired examples.

Few-Shot Learning Transfer Learning

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

no code implementations3 Dec 2020 Puyuan Peng, Herman Kamper, Karen Livescu

We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation.

Word Embeddings

Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images

1 code implementation14 Aug 2020 Leanne Nortje, Herman Kamper

Here we compare transfer learning to unsupervised models trained on unlabelled in-domain data.

Transfer Learning

Evaluating computational models of infant phonetic learning across languages

no code implementations6 Aug 2020 Yevgen Matusevych, Thomas Schatz, Herman Kamper, Naomi H. Feldman, Sharon Goldwater

In the first year of life, infants' speech perception becomes attuned to the sounds of their native language.

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

1 code implementation2 Jun 2020 Herman Kamper, Yevgen Matusevych, Sharon Goldwater

We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.

Speech Recognition Word Embeddings

Analyzing autoencoder-based acoustic word embeddings

no code implementations3 Apr 2020 Yevgen Matusevych, Herman Kamper, Sharon Goldwater

To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs.

Word Embeddings

Unsupervised feature learning for speech using correspondence and Siamese networks

no code implementations28 Mar 2020 Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper

Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models.

BINet: a binary inpainting network for deep patch-based image compression

1 code implementation11 Dec 2019 André Nortje, Willie Brink, Herman A. Engelbrecht, Herman Kamper

We propose the Binary Inpainting Network (BINet), an autoencoder framework which incorporates binary inpainting to reinstate interdependencies between adjacent patches, for improved patch-based compression of still images.

Image Compression

Deep motion estimation for parallel inter-frame prediction in video compression

1 code implementation11 Dec 2019 André Nortje, Herman A. Engelbrecht, Herman Kamper

Standard video codecs rely on optical flow to guide inter-frame prediction: pixels from reference frames are moved via motion vectors to predict target video frames.

Motion Estimation Optical Flow Estimation +1

If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks

no code implementations13 Oct 2019 Arnu Pretorius, Elan van Biljon, Benjamin van Niekerk, Ryan Eloff, Matthew Reynard, Steve James, Benjamin Rosman, Herman Kamper, Steve Kroon

Our results therefore suggest that, in the shallow-to-moderate depth setting, critical initialisation provides zero performance gains when compared to off-critical initialisations and that searching for off-critical initialisations that might improve training speed or generalisation, is likely to be a fruitless endeavour.

On the expected behaviour of noise regularised deep neural networks as Gaussian processes

no code implementations12 Oct 2019 Arnu Pretorius, Herman Kamper, Steve Kroon

Recent work has established the equivalence between deep neural networks and Gaussian processes (GPs), resulting in so-called neural network Gaussian processes (NNGPs).

Gaussian Processes

Cross-lingual topic prediction for speech using translations

no code implementations29 Aug 2019 Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic?

Speech-to-Text Translation Translation

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval

no code implementations24 Apr 2019 Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu

Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision.

Visual Grounding

Semantic query-by-example speech search using visual grounding

no code implementations15 Apr 2019 Herman Kamper, Aristotelis Anastassiou, Karen Livescu

A number of recent studies have started to investigate how speech systems can be trained on untranscribed speech by leveraging accompanying images at training time.

Semantic Retrieval Visual Grounding

Multimodal One-Shot Learning of Speech and Images

2 code implementations9 Nov 2018 Ryan Eloff, Herman A. Engelbrecht, Herman Kamper

Imagine a robot is shown new concepts visually together with spoken tags, e. g. "milk", "eggs", "butter".

Dynamic Time Warping One-Shot Learning

Multilingual and Unsupervised Subword Modeling for Zero-Resource Languages

1 code implementation9 Nov 2018 Enno Hermann, Herman Kamper, Sharon Goldwater

Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task.

Critical initialisation for deep signal propagation in noisy rectifier neural networks

1 code implementation NeurIPS 2018 Arnu Pretorius, Elan van Biljon, Steve Kroon, Herman Kamper

Simulations and experiments on real-world data confirm that our proposed initialisation is able to stably propagate signals in deep networks, while using an initialisation disregarding noise fails to do so.

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

2 code implementations1 Nov 2018 Herman Kamper

We investigate unsupervised models that can map a variable-duration speech segment to a fixed-dimensional representation.

Word Embeddings

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

1 code implementation NAACL 2019 Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

Finally, we show that the approach improves performance on a true low-resource task: pre-training on a combination of English ASR and French ASR improves Mboshi-French ST, where only 4 hours of data are available, from 3. 5 to 7. 1

automatic-speech-recognition Speech Recognition +2

Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring

no code implementations25 Jun 2018 Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler

While the resulting CNN keyword spotter cannot match the performance of the DTW-based system, it substantially outperforms a CNN classifier trained only on the keywords, improving the area under the ROC curve from 0. 54 to 0. 64.

Dynamic Time Warping Keyword Spotting +1

Visually grounded cross-lingual keyword spotting in speech

no code implementations13 Jun 2018 Herman Kamper, Michael Roth

Recent work considered how images paired with speech can be used as supervision for building speech systems when transcriptions are not available.

Keyword Spotting Visual Grounding

Low-Resource Speech-to-Text Translation

no code implementations24 Mar 2018 Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

We explore models trained on between 20 and 160 hours of data, and find that although models trained on less data have considerably lower BLEU scores, they can still predict words with relatively high precision and recall---around 50% for a model trained on 50 hours of data, versus around 60% for the full 160 hour model.

Machine Translation Speech Recognition +2

Semantic speech retrieval with a visually grounded model of untranscribed speech

2 code implementations5 Oct 2017 Herman Kamper, Gregory Shakhnarovich, Karen Livescu

We introduce a newly collected data set of human semantic relevance judgements and an associated task, semantic speech retrieval, where the goal is to search for spoken utterances that are semantically relevant to a given text query.

Language Acquisition

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

1 code implementation12 Jun 2017 Shane Settle, Keith Levin, Herman Kamper, Karen Livescu

Query-by-example search often uses dynamic time warping (DTW) for comparing queries and proposed matching segments.

Dynamic Time Warping Word Embeddings

Visually grounded learning of keyword prediction from untranscribed speech

1 code implementation23 Mar 2017 Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu

In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech.

Language Acquisition

An embedded segmental K-means model for unsupervised segmentation and clustering of speech

2 code implementations23 Mar 2017 Herman Kamper, Karen Livescu, Sharon Goldwater

Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing.

Bayesian Inference Word Embeddings

Towards speech-to-text translation without speech recognition

no code implementations EACL 2017 Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

We explore the problem of translating speech to text in low-resource scenarios where neither automatic speech recognition (ASR) nor machine translation (MT) are available, but we have training data in the form of audio paired with text translations.

automatic-speech-recognition Machine Translation +3

Unsupervised neural and Bayesian models for zero-resource speech processing

no code implementations3 Jan 2017 Herman Kamper

Finally, we show that the clusters discovered by the segmental Bayesian model can be made less speaker- and gender-specific by using features from the cAE instead of traditional acoustic features.

Language Modelling Representation Learning

Weakly supervised spoken term discovery using cross-lingual side information

no code implementations21 Sep 2016 Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez

Recent work on unsupervised term discovery (UTD) aims to identify and cluster repeated word-like units from audio alone.

A segmental framework for fully-unsupervised large-vocabulary speech recognition

5 code implementations22 Jun 2016 Herman Kamper, Aren Jansen, Sharon Goldwater

We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding).

Language Modelling Large Vocabulary Continuous Speech Recognition +2

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

no code implementations9 Mar 2016 Herman Kamper, Aren Jansen, Sharon Goldwater

In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text.

Language Acquisition Language Modelling +2

Deep convolutional acoustic word embeddings using word-pair side information

1 code implementation5 Oct 2015 Herman Kamper, Weiran Wang, Karen Livescu

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units.

Speech Recognition Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.