no code implementations • NIDCP (LREC) 2022 • Juhong Zhan, Yue Jiang, Christopher Cieri, Mark Liberman, Jiahong Yuan, Yiya Chen, Odette Scharenborg
This paper describes our use of mixed incentives and the citizen science portal LanguageARC to prepare, collect and quality control a large corpus of object namings for the purpose of providing speech data to document the under-represented Guanzhong dialect of Chinese spoken in the Shaanxi province in the environs of Xi’an.
no code implementations • 14 Jan 2025 • Dimme de Groot, Baturalp Karslioglu, Odette Scharenborg, Jorge Martinez
In this paper we propose a robust loudspeaker beamforming algorithm which is used to enhance the performance of voice driven applications in scenarios where the loudspeakers introduce the majority of the noise, e. g. when music is playing loudly.
2 code implementations • 4 Dec 2024 • Luciana Ferrer, Odette Scharenborg, Tom Bäckström
If the data is incorrectly selected, the wrong metric is chosen for evaluation or the significance of the comparisons between models is overestimated, conclusions may be misleading or result in suboptimal development decisions.
1 code implementation • 26 Aug 2024 • Kalvin Chang, Yi-Hui Chou, Jiatong Shi, Hsuan-Ming Chen, Nicole Holliday, Odette Scharenborg, David R. Mortensen
Underperformance of ASR systems for speakers of African American Vernacular English (AAVE) and other marginalized language varieties is a well-documented phenomenon, and one that reinforces the stigmatization of these varieties.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 24 Aug 2024 • Wiebke Hutiri, Tanvina Patel, Aaron Yi Ding, Odette Scharenborg
Detecting and mitigating bias in speaker verification systems is important, as datasets, processing choices and algorithms can lead to performance differences that systematically favour some groups of people while disadvantaging others.
no code implementations • 12 Jun 2024 • Yuanyuan Zhang, Zhengjun Yue, Tanvina Patel, Odette Scharenborg
State-of-the-art ASRs show suboptimal performance for child speech.
no code implementations • 24 Dec 2023 • Yuanyuan Zhang, Aaricia Herygers, Tanvina Patel, Zhengjun Yue, Odette Scharenborg
We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 9 Nov 2023 • Zhaofeng Lin, Tanvina Patel, Odette Scharenborg
Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao
This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.
no code implementations • 5 Jul 2023 • Tanvina Patel, Odette Scharenborg
Speech technology has improved greatly for norm speakers, i. e., adult native speakers of a language without speech impediments or strong accents.
1 code implementation • 24 Jun 2022 • Hang Ji, Tanvina Patel, Odette Scharenborg
Compared with MFCC, in the within-language scenario, the performance of these SSL speech pre-trained models on AF probing tasks achieved a maximum relative increase of 34. 4%, and it resulted in the lowest PER of 10. 2%.
no code implementations • 31 Mar 2022 • Bence Mark Halpern, Teja Rebernik, Thomas Tienkamp, Rob van Son, Michiel van den Brekel, Martijn Wieling, Max Witjes, Odette Scharenborg
We present an articulatory synthesis framework for the synthesis and manipulation of oral cancer speech for clinical decision making and alleviation of patient stress.
1 code implementation • 14 Mar 2022 • Danny Merkx, Sebastiaan Scholten, Stefan L. Frank, Mirjam Ernestus, Odette Scharenborg
We furthermore investigate whether vector quantisation, a technique for discrete representation learning, aids the model in the discovery and recognition of words.
1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak
In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 13 Jan 2022 • Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg
In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition.
no code implementations • 15 Oct 2021 • Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda
We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity.
no code implementations • 1 Jul 2021 • Bence Mark Halpern, Julian Fritsch, Enno Hermann, Rob van Son, Odette Scharenborg, Mathew Magimai. -Doss
The development of pathological speech systems is currently hindered by the lack of a standardised objective evaluation framework.
no code implementations • 15 Jun 2021 • Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velazquez, Odette Scharenborg
This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change.
1 code implementation • 2 Apr 2021 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Odette Scharenborg
In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent.
1 code implementation • 28 Mar 2021 • Siyuan Feng, Olya Kudina, Bence Mark Halpern, Odette Scharenborg
Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with the large variation in speech due to e. g., gender, age, speech impairment, race, and accents.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 17 Dec 2020 • Siyuan Feng, Odette Scharenborg
Taken together, the analyses showed that the two stages in our approach are both effective in capturing phoneme and AF information.
1 code implementation • 23 Oct 2020 • Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg
This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes.
1 code implementation • 22 Oct 2020 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 31 Jul 2020 • Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg
For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences.
1 code implementation • 28 Jul 2020 • Bence Mark Halpern, Rob van Son, Michiel van den Brekel, Odette Scharenborg
3) We set baselines for an oral cancer speech detection task on this dataset.
no code implementations • 25 Jul 2020 • Siyuan Feng, Odette Scharenborg
Our system is less sensitive to training data amount when the training data is over 50 hours.
no code implementations • 31 May 2020 • Sebastiaan Scholten, Danny Merkx, Odette Scharenborg
We investigated word recognition in a Visually Grounded Speech model.
no code implementations • 16 May 2020 • Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
2 code implementations • 14 May 2020 • Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg
An estimated half of the world's languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies.
no code implementations • 13 Mar 2018 • Odette Scharenborg, Martha Larson
Music stretches with and without lyrics were sampled from the same song in order to control for factors beyond the presence of lyrics.
no code implementations • 16 Feb 2018 • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur
Developing speech technologies for low-resource languages has become a very active research field over the last decade.
no code implementations • 14 Feb 2018 • Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography.
no code implementations • Speech communication 2007 • Odette Scharenborg, Vincent Wan, Roger K. Moore
As part of this work we are investigating automatic feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal.