Search Results for author: Hugo Van hamme

Found 54 papers, 12 papers with code

MSNER: A Multilingual Speech Dataset for Named Entity Recognition

no code implementations19 May 2024 Quentin Meeus, Marie-Francine Moens, Hugo Van hamme

While extensively explored in text-based tasks, Named Entity Recognition (NER) remains largely neglected in spoken language understanding.

named-entity-recognition Named Entity Recognition +2

Detecting Post-Stroke Aphasia Via Brain Responses to Speech in a Deep Learning Framework

no code implementations17 Jan 2024 Pieter De Clercq, Corentin Puffay, Jill Kries, Hugo Van hamme, Maaike Vandermosten, Tom Francart, Jonas Vanthornhout

We modeled electroencephalography (EEG) responses to acoustic, segmentation, and linguistic speech representations of a story using convolutional neural networks trained on a large sample of healthy participants, serving as a model for intact neural tracking of speech.

EEG

Unsupervised Accent Adaptation Through Masked Language Model Correction Of Discrete Self-Supervised Speech Units

no code implementations25 Sep 2023 Jakob Poncelet, Hugo Van hamme

Self-supervised pre-trained speech models have strongly improved speech recognition, yet they are still sensitive to domain shifts and accented or atypical speech.

Accented Speech Recognition Language Modelling +1

Analysis of XLS-R for Speech Quality Assessment

1 code implementation23 Aug 2023 Bastiaan Tamm, Rik Vandenberghe, Hugo Van hamme

In online conferencing applications, estimating the perceived quality of an audio signal is crucial to ensure high quality of experience for the end user.

The role of vowel and consonant onsets in neural tracking of natural speech

no code implementations31 Jul 2023 Mohammad Jalilpour Monesi, Jonas Vanthornhout, Hugo Van hamme, Tom Francart

Our results show that vowel-consonant onsets outperform onsets of any phone in both tasks, which suggests that neural tracking of the vowel vs. consonant exists in the EEG to some degree.

EEG

Rehearsal-Free Online Continual Learning for Automatic Speech Recognition

1 code implementation19 Jun 2023 Steven Vander Eeckt, Hugo Van hamme

Fine-tuning an Automatic Speech Recognition (ASR) model to new domains results in degradation on original domains, referred to as Catastrophic Forgetting (CF).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation

no code implementations12 Jun 2023 Jinzi Qi, Hugo Van hamme

In dysarthric speech recognition, data scarcity and the vast diversity between dysarthric speakers pose significant challenges.

speech-recognition Speech Recognition +1

CLASH: Contrastive learning through alignment shifting to extract stimulus information from EEG

no code implementations9 Jan 2023 Bernd Accou, Hugo Van hamme, Tom Francart

Additionally, we show that in contrast to the baseline denoising techniques, our method can be used with data of unseen subjects and stimuli without retraining, improving decoding performance by 19\% and 34\% over raw EEG for two holdout datasets.

Contrastive Learning Denoising +1

Multitask Learning for Low Resource Spoken Language Understanding

no code implementations24 Nov 2022 Quentin Meeus, Marie-Francine Moens, Hugo Van hamme

We explore the benefits that multitask learning offer to speech processing as we train models on dual objectives with automatic speech recognition and intent classification or sentiment classification.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Bidirectional Representations for Low Resource Spoken Language Understanding

no code implementations24 Nov 2022 Quentin Meeus, Marie-Francine Moens, Hugo Van hamme

Class attention can be used to visually explain the predictions of our model, which goes a long way in understanding how the model makes predictions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Impact of visual assistance for automated audio captioning

no code implementations18 Nov 2022 Wim Boes, Hugo Van hamme

More specifically, visual features focusing on semantics appear appropriate in the context of automated audio captioning, while for sound event detection, time information seems to be more important.

Audio captioning Event Detection +2

Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition

no code implementations27 Oct 2022 Steven Vander Eeckt, Hugo Van hamme

Adapting a trained Automatic Speech Recognition (ASR) model to new tasks results in catastrophic forgetting of old tasks, limiting the model's ability to learn continually and to be extended to new speakers, dialects, languages, etc.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training

no code implementations24 Oct 2022 Jinzi Qi, Hugo Van hamme

The scarcity of training data and the large speaker variation in dysarthric speech lead to poor accuracy and poor speaker generalization of spoken language understanding systems for dysarthric speech.

Spoken Language Understanding

Optimizing Temporal Resolution Of Convolutional Recurrent Neural Networks For Sound Event Detection

no code implementations18 Oct 2022 Wim Boes, Hugo Van hamme

This is significantly better than the performance obtained by the baseline model (0. 527), which can effectively be attributed to the changes that were applied to the pooling operations of the network.

Event Detection Sound Event Detection +1

Multi-Source Transformer Architectures for Audiovisual Scene Classification

no code implementations18 Oct 2022 Wim Boes, Hugo Van hamme

With regard to the accuracy measure, our best model achieved a score of 77. 1\% on the validation data, which is about the same as the performance obtained by the baseline system (77. 0\%).

Classification Scene Classification

Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition

no code implementations14 Oct 2022 Jakob Poncelet, Hugo Van hamme

However, subtitles are not verbatim (i. e. exact) transcriptions of speech, so they cannot be used directly to improve an Automatic Speech Recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications

1 code implementation1 Oct 2022 Bastiaan Tamm, Helena Balabin, Rik Vandenberghe, Hugo Van hamme

Speech quality in online conferencing applications is typically assessed through human judgements in the form of the mean opinion score (MOS) metric.

Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection

no code implementations26 Sep 2022 Wim Boes, Hugo Van hamme

Many state-of-the-art systems for audio tagging and sound event detection employ convolutional recurrent neural architectures.

Audio Tagging Event Detection +2

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

no code implementations26 Sep 2022 Wim Boes, Hugo Van hamme

Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries.

Audio Tagging Event Detection +1

Relating the fundamental frequency of speech with EEG using a dilated convolutional network

no code implementations5 Jul 2022 Corentin Puffay, Jana Van Canneyt, Jonas Vanthornhout, Hugo Van hamme, Tom Francart

To investigate how speech is processed in the brain, we can model the relation between features of a natural speech signal and the corresponding recorded electroencephalogram (EEG).

EEG

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

no code implementations28 Jun 2022 Pu Wang, Hugo Van hamme

End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data.

Spoken Language Understanding

Using Adapters to Overcome Catastrophic Forgetting in End-to-End Automatic Speech Recognition

1 code implementation30 Mar 2022 Steven Vander Eeckt, Hugo Van hamme

In this paper, we aim to overcome CF for E2E ASR by inserting adapters, small architectures of few parameters which allow a general model to be fine-tuned to a specific task, into our model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Continual Learning for Monolingual End-to-End Automatic Speech Recognition

1 code implementation17 Dec 2021 Steven Vander Eeckt, Hugo Van hamme

Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch

no code implementations29 Sep 2021 Jakob Poncelet, Hugo Van hamme

Recent research in speech processing exhibits a growing interest in unsupervised and self-supervised representation learning from unlabelled data to alleviate the need for large amounts of annotated data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model

2 code implementations17 Jun 2021 Mohammad Jalilpour Monesi, Bernd Accou, Tom Francart, Hugo Van hamme

Decoding the speech signal that a person is listening to from the human brain via electroencephalography (EEG) can help us understand how our auditory system works.

EEG

On the long-term learning ability of LSTM LMs

no code implementations16 Jun 2021 Wim Boes, Robbe Van Rompaey, Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq

We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance.

Sentence

A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech

no code implementations15 Jun 2021 Pu Wang, Bagher BabaAli, Hugo Van hamme

The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech and finetuning on a mixture of dysarthric and normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speech Disorder Classification Using Extended Factorized Hierarchical Variational Auto-encoders

no code implementations14 Jun 2021 Jinzi Qi, Hugo Van hamme

We show that an extension of the FHVAE model succeeds in the better disentanglement of the content-related and sequence-related related representations, but both representations are still required for best results on disorder type classification.

Classification Disentanglement +1

Audiovisual transfer learning for audio tagging and sound event detection

no code implementations9 Jun 2021 Wim Boes, Hugo Van hamme

We study the merit of transfer learning for two sound recognition problems, i. e., audio tagging and sound event detection.

Audio Tagging Event Detection +2

Predicting speech intelligibility from EEG in a non-linear classification paradigm

no code implementations14 May 2021 Bernd Accou, Mohammad Jalilpour Monesi, Hugo Van hamme, Tom Francart

The accuracy of the model's match/mismatch predictions can be used as a proxy for speech intelligibility without subject-specific (re)training.

EEG

Pre-training for low resource speech-to-intent applications

no code implementations30 Mar 2021 Pu Wang, Hugo Van hamme

In this paper we combine the encoder of an end-to-end ASR system with the prior NMF/capsule network-based user-taught decoder, and investigate whether pre-training methodology can reduce training data requirements for the NMF and capsule network.

Decoder

CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization

1 code implementation19 Dec 2019 Jeroen Zegers, Hugo Van hamme

In this paper we propose a novel network for source separation using an encoder-decoder CNN and LSTM in parallel.

Decoder Multi-Speaker Source Separation

Practical applicability of deep neural networks for overlapping speaker separation

no code implementations19 Dec 2019 Pieter Appeltans, Jeroen Zegers, Hugo Van hamme

This paper examines the applicability in realistic scenarios of two deep learning based solutions to the overlapping speaker separation problem.

Clustering Deep Clustering +1

Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events

no code implementations2 Dec 2019 Wim Boes, Hugo Van hamme

We tackle the task of environmental event classification by drawing inspiration from the transformer neural network architecture used in machine translation.

General Classification Machine Translation +1

Information-Weighted Neural Cache Language Models for ASR

no code implementations24 Sep 2018 Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq

Neural cache language models (LMs) extend the idea of regular cache language models by making the cache probability dependent on the similarity between the current context and the context of the words in the cache.

Multi-scenario deep learning for multi-speaker source separation

1 code implementation24 Aug 2018 Jeroen Zegers, Hugo Van hamme

Furthermore, it is concluded that a single model, trained on different scenarios is capable of matching performance of scenario specific models.

Multi-Speaker Source Separation

Memory Time Span in LSTMs for Multi-Speaker Source Separation

1 code implementation24 Aug 2018 Jeroen Zegers, Hugo Van hamme

With deep learning approaches becoming state-of-the-art in many speech (as well as non-speech) related machine learning tasks, efforts are being taken to delve into the neural networks which are often considered as a black box.

Multi-Speaker Source Separation

State Gradients for RNN Memory Analysis

no code implementations WS 2018 Lyan Verwimp, Hugo Van hamme, Vincent Renkens, Patrick Wambacq

We present a framework for analyzing what the state in RNNs remembers from its input embeddings.

Language Models of Spoken Dutch

no code implementations12 Sep 2017 Lyan Verwimp, Joris Pelemans, Marieke Lycke, Hugo Van hamme, Patrick Wambacq

One model is trained on all available data (46M word tokens), but we also trained models on a specific type of TV show or domain/topic.

Language Modelling speech-recognition +1

Character-Word LSTM Language Models

no code implementations EACL 2017 Lyan Verwimp, Joris Pelemans, Hugo Van hamme, Patrick Wambacq

We present a Character-Word Long Short-Term Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model.

Language Modelling

SCALE: A Scalable Language Engineering Toolkit

1 code implementation LREC 2016 Joris Pelemans, Lyan Verwimp, Kris Demuynck, Hugo Van hamme, Patrick Wambacq

In this paper we present SCALE, a new Python toolkit that contains two extensions to n-gram language models.

Language Modelling

Joint Sound Source Separation and Speaker Recognition

no code implementations29 Apr 2016 Jeroen Zegers, Hugo Van hamme

It is shown how state-of-the-art multichannel NMF for blind source separation can be easily extended to incorporate speaker recognition.

blind source separation Speaker Recognition

Speech Recognition Web Services for Dutch

no code implementations LREC 2014 Joris Pelemans, Kris Demuynck, Hugo Van hamme, Patrick Wambacq

In this paper we present 3 applications in the domain of Automatic Speech Recognition for Dutch, all of which are developed using our in-house speech recognition toolkit SPRAAK.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.