Search Results for author: Odette Scharenborg

Found 28 papers, 10 papers with code

Using Mixed Incentives to Document Xi’an Guanzhong

no code implementations NIDCP (LREC) 2022 Juhong Zhan, Yue Jiang, Christopher Cieri, Mark Liberman, Jiahong Yuan, Yiya Chen, Odette Scharenborg

This paper describes our use of mixed incentives and the citizen science portal LanguageARC to prepare, collect and quality control a large corpus of object namings for the purpose of providing speech data to document the under-represented Guanzhong dialect of Chinese spoken in the Shaanxi province in the environs of Xi’an.

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code implementations15 Sep 2023 Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Audio-Visual Speech Recognition speech-recognition +2

Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End Speech Recognition Systems

no code implementations5 Jul 2023 Tanvina Patel, Odette Scharenborg

Speech technology has improved greatly for norm speakers, i. e., adult native speakers of a language without speech impediments or strong accents.

Anatomy Data Augmentation +2

Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

1 code implementation24 Jun 2022 Hang Ji, Tanvina Patel, Odette Scharenborg

Compared with MFCC, in the within-language scenario, the performance of these SSL speech pre-trained models on AF probing tasks achieved a maximum relative increase of 34. 4%, and it resulted in the lowest PER of 10. 2%.

Self-Supervised Learning

Manipulation of oral cancer speech using neural articulatory synthesis

no code implementations31 Mar 2022 Bence Mark Halpern, Teja Rebernik, Thomas Tienkamp, Rob van Son, Michiel van den Brekel, Martijn Wieling, Max Witjes, Odette Scharenborg

We present an articulatory synthesis framework for the synthesis and manipulation of oral cancer speech for clinical decision making and alleviation of patient stress.

Decision Making

Modelling word learning and recognition using visually grounded speech

1 code implementation14 Mar 2022 Danny Merkx, Sebastiaan Scholten, Stefan L. Frank, Mirjam Ernestus, Odette Scharenborg

We furthermore investigate whether vector quantisation, a technique for discrete representation learning, aids the model in the discovery and recognition of words.

Representation Learning speech-recognition +1

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

1 code implementation26 Jan 2022 Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

no code implementations13 Jan 2022 Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition.

Generative Adversarial Network speech-recognition +2

Towards Identity Preserving Normal to Dysarthric Voice Conversion

no code implementations15 Oct 2021 Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity.

Data Augmentation Decision Making +3

An Objective Evaluation Framework for Pathological Speech Synthesis

no code implementations1 Jul 2021 Bence Mark Halpern, Julian Fritsch, Enno Hermann, Rob van Son, Odette Scharenborg, Mathew Magimai. -Doss

The development of pathological speech systems is currently hindered by the lack of a standardised objective evaluation framework.

Speech Synthesis Voice Conversion

Pathological voice adaptation with autoencoder-based voice conversion

no code implementations15 Jun 2021 Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velazquez, Odette Scharenborg

This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change.

Speech Synthesis Voice Conversion

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

1 code implementation2 Apr 2021 Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Odette Scharenborg

In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent.

Acoustic Unit Discovery Clustering

Quantifying Bias in Automatic Speech Recognition

1 code implementation28 Mar 2021 Siyuan Feng, Olya Kudina, Bence Mark Halpern, Odette Scharenborg

Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with the large variation in speech due to e. g., gender, age, speech impairment, race, and accents.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

no code implementations17 Dec 2020 Siyuan Feng, Odette Scharenborg

Taken together, the analyses showed that the two stages in our approach are both effective in capturing phoneme and AF information.

Self-Supervised Learning Transfer Learning

Show and Speak: Directly Synthesize Spoken Description of Images

1 code implementation23 Oct 2020 Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes.

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

1 code implementation22 Oct 2020 Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Evaluating Automatically Generated Phoneme Captions for Images

no code implementations31 Jul 2020 Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg

For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences.

Image Captioning

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

no code implementations25 Jul 2020 Siyuan Feng, Odette Scharenborg

Our system is less sensitive to training data amount when the training data is over 50 hours.

S2IGAN: Speech-to-Image Generation via Adversarial Learning

2 code implementations14 May 2020 Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg

An estimated half of the world's languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies.

Image Generation

Investigating the Effect of Music and Lyrics on Spoken-Word Recognition

no code implementations13 Mar 2018 Odette Scharenborg, Martha Larson

Music stretches with and without lyrics were sampled from the same song in order to control for factors beyond the presence of lyrics.

Towards capturing fine phonetic variation in speech using articulatory features

no code implementations Speech communication 2007 Odette Scharenborg, Vincent Wan, Roger K. Moore

As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.