1 code implementation • ACL 2022 • Aidan Pine, Dan Wells, Nathan Brinklow, Patrick Littell, Korin Richmond
This paper describes the motivation and development of speech synthesis systems for the purposes of language revitalization.
no code implementations • 26 Sep 2024 • Yujia Sun, Zeyu Zhao, Korin Richmond, Yuanchao Li
In this work, we revisit the acoustic similarity between emotion speech and music, starting with an analysis of the layerwise behavior of SSL models for Speech Emotion Recognition (SER) and Music Emotion Recognition (MER).
1 code implementation • 25 Sep 2024 • Zhichen Han, Tianqi Geng, Hui Feng, Jiahong Yuan, Korin Richmond, Yuanchao Li
Utilizing Self-Supervised Learning (SSL) models for Speech Emotion Recognition (SER) has proven effective, yet limited research has explored cross-lingual scenarios.
no code implementations • 15 Sep 2024 • Siqi Sun, Korin Richmond
Recent work has shown the feasibility and benefit of bootstrapping an integrated sequence-to-sequence (Seq2Seq) linguistic frontend from a traditional pipeline-based frontend for text-to-speech (TTS).
no code implementations • 13 Sep 2024 • Jinzuomu Zhong, Korin Richmond, Zhiba Su, Siqi Sun
While recent Zero-Shot Text-to-Speech (ZS-TTS) models have achieved high naturalness and speaker similarity, they fall short in accent fidelity and control.
1 code implementation • 13 Jun 2024 • Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks.
1 code implementation • 11 Sep 2023 • Jinzuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu
In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech.
1 code implementation • 22 Sep 2022 • Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter
While previous work has focused on predicting listeners' ratings (mean opinion scores) of individual stimuli, we focus on the simpler task of predicting subjective preference given two speech stimuli for the same text.
no code implementations • 31 May 2021 • Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals
Our results demonstrate the strength of our approach and its ability to generalise to data from new domains.
no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing.
no code implementations • 27 Feb 2021 • Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals
For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio.
no code implementations • 19 Nov 2020 • Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals
We present the Tongue and Lips corpus (TaL), a multi-speaker corpus of audio, ultrasound tongue imaging, and lip videos.
1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench
In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.
1 code implementation • 1 Jul 2019 • Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals
Audiovisual synchronisation is the task of determining the time offset between speech audio and a video recording of the articulators.
1 code implementation • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.
no code implementations • 1 Jul 2019 • Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production.
1 code implementation • 31 Oct 2018 • Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.
2 code implementations • 15 Dec 2016 • Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond
We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately.
no code implementations • 4 Sep 2015 • Alexander Hewer, Ingmar Steiner, Timo Bolkart, Stefanie Wuhrer, Korin Richmond
The palate model is then tested using 3D MRI from another corpus and evaluated using a high-resolution optical scan.