Search Results for author: Odette Scharenborg

Found 28 papers, 10 papers with code

Using Mixed Incentives to Document Xi’an Guanzhong

no code implementations • NIDCP (LREC) 2022 • Juhong Zhan, Yue Jiang, Christopher Cieri, Mark Liberman, Jiahong Yuan, Yiya Chen, Odette Scharenborg

This paper describes our use of mixed incentives and the citizen science portal LanguageARC to prepare, collect and quality control a large corpus of object namings for the purpose of providing speech data to document the under-represented Guanzhong dialect of Chinese spoken in the Shaanxi province in the environs of Xi’an.

Paper
Add Code

Exploring data augmentation in bias mitigation against non-native-accented speech

no code implementations • 24 Dec 2023 • Yuanyuan Zhang, Aaricia Herygers, Tanvina Patel, Zhengjun Yue, Odette Scharenborg

We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation

1 code implementation • 9 Nov 2023 • Zhaofeng Lin, Tanvina Patel, Odette Scharenborg

Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Audio-Visual Speech Recognition speech-recognition +2

Paper
Add Code

Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End Speech Recognition Systems

no code implementations • 5 Jul 2023 • Tanvina Patel, Odette Scharenborg

Speech technology has improved greatly for norm speakers, i. e., adult native speakers of a language without speech impediments or strong accents.

Anatomy Data Augmentation +2

Paper
Add Code

Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

1 code implementation • 24 Jun 2022 • Hang Ji, Tanvina Patel, Odette Scharenborg

Compared with MFCC, in the within-language scenario, the performance of these SSL speech pre-trained models on AF probing tasks achieved a maximum relative increase of 34. 4%, and it resulted in the lowest PER of 10. 2%.

Self-Supervised Learning

Paper
Code

Manipulation of oral cancer speech using neural articulatory synthesis

no code implementations • 31 Mar 2022 • Bence Mark Halpern, Teja Rebernik, Thomas Tienkamp, Rob van Son, Michiel van den Brekel, Martijn Wieling, Max Witjes, Odette Scharenborg

We present an articulatory synthesis framework for the synthesis and manipulation of oral cancer speech for clinical decision making and alleviation of patient stress.

Decision Making

Paper
Add Code

Modelling word learning and recognition using visually grounded speech

1 code implementation • 14 Mar 2022 • Danny Merkx, Sebastiaan Scholten, Stefan L. Frank, Mirjam Ernestus, Odette Scharenborg

We furthermore investigate whether vector quantisation, a technique for discrete representation learning, aids the model in the discovery and recognition of words.

Representation Learning speech-recognition +1

Paper
Code

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

no code implementations • 13 Jan 2022 • Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition.

Generative Adversarial Network speech-recognition +2

Paper
Add Code

Towards Identity Preserving Normal to Dysarthric Voice Conversion

no code implementations • 15 Oct 2021 • Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity.

Data Augmentation Decision Making +3

Paper
Add Code

An Objective Evaluation Framework for Pathological Speech Synthesis

no code implementations • 1 Jul 2021 • Bence Mark Halpern, Julian Fritsch, Enno Hermann, Rob van Son, Odette Scharenborg, Mathew Magimai. -Doss

The development of pathological speech systems is currently hindered by the lack of a standardised objective evaluation framework.

Speech Synthesis Voice Conversion

Paper
Add Code

Pathological voice adaptation with autoencoder-based voice conversion

no code implementations • 15 Jun 2021 • Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velazquez, Odette Scharenborg

This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change.

Speech Synthesis Voice Conversion

Paper
Add Code

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

1 code implementation • 2 Apr 2021 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Odette Scharenborg

In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent.

Acoustic Unit Discovery Clustering

Paper
Code

Quantifying Bias in Automatic Speech Recognition

1 code implementation • 28 Mar 2021 • Siyuan Feng, Olya Kudina, Bence Mark Halpern, Odette Scharenborg

Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with the large variation in speech due to e. g., gender, age, speech impairment, race, and accents.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

no code implementations • 17 Dec 2020 • Siyuan Feng, Odette Scharenborg

Taken together, the analyses showed that the two stages in our approach are both effective in capturing phoneme and AF information.

Self-Supervised Learning Transfer Learning

Paper
Add Code

Show and Speak: Directly Synthesize Spoken Description of Images

1 code implementation • 23 Oct 2020 • Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes.

Paper
Code

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

1 code implementation • 22 Oct 2020 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Evaluating Automatically Generated Phoneme Captions for Images

no code implementations • 31 Jul 2020 • Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg

For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences.

Image Captioning

Paper
Add Code

Detecting and analysing spontaneous oral cancer speech in the wild

1 code implementation • 28 Jul 2020 • Bence Mark Halpern, Rob van Son, Michiel van den Brekel, Odette Scharenborg

3) We set baselines for an oral cancer speech detection task on this dataset.

BIG-bench Machine Learning

Paper
Code

Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

no code implementations • 25 Jul 2020 • Siyuan Feng, Odette Scharenborg

Our system is less sensitive to training data amount when the training data is over 50 hours.

Paper
Add Code

Learning to Recognise Words using Visually Grounded Speech

no code implementations • 31 May 2020 • Sebastiaan Scholten, Danny Merkx, Odette Scharenborg

We investigated word recognition in a Visually Grounded Speech model.

Image Retrieval Retrieval

Paper
Add Code

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

no code implementations • 16 May 2020 • Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

S2IGAN: Speech-to-Image Generation via Adversarial Learning

2 code implementations • 14 May 2020 • Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg

An estimated half of the world's languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies.

Image Generation

Paper
Code

Investigating the Effect of Music and Lyrics on Spoken-Word Recognition

no code implementations • 13 Mar 2018 • Odette Scharenborg, Martha Larson

Music stretches with and without lyrics were sampled from the same song in order to control for factors beyond the presence of lyrics.

Paper
Add Code

Bayesian Models for Unit Discovery on a Very Low Resource Language

no code implementations • 16 Feb 2018 • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur

Developing speech technologies for low-resource languages has become a very active research field over the last decade.

Acoustic Unit Discovery Segmentation

Paper
Add Code

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

no code implementations • 14 Feb 2018 • Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography.

Paper
Add Code

Towards capturing fine phonetic variation in speech using articulatory features

no code implementations • Speech communication 2007 • Odette Scharenborg, Vincent Wan, Roger K. Moore

As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.