Search Results for author: Korin Richmond

Found 19 papers, 10 papers with code

Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations

no code implementations26 Sep 2024 Yujia Sun, Zeyu Zhao, Korin Richmond, Yuanchao Li

In this work, we revisit the acoustic similarity between emotion speech and music, starting with an analysis of the layerwise behavior of SSL models for Speech Emotion Recognition (SER) and Music Emotion Recognition (MER).

Domain Generalization Music Emotion Recognition +3

Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models

1 code implementation25 Sep 2024 Zhichen Han, Tianqi Geng, Hui Feng, Jiahong Yuan, Korin Richmond, Yuanchao Li

Utilizing Self-Supervised Learning (SSL) models for Speech Emotion Recognition (SER) has proven effective, yet limited research has explored cross-lingual scenarios.

parameter-efficient fine-tuning Self-Supervised Learning +2

Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning

no code implementations15 Sep 2024 Siqi Sun, Korin Richmond

Recent work has shown the feasibility and benefit of bootstrapping an integrated sequence-to-sequence (Seq2Seq) linguistic frontend from a traditional pipeline-based frontend for text-to-speech (TTS).

Multi-Task Learning Text to Speech

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

no code implementations13 Sep 2024 Jinzuomu Zhong, Korin Richmond, Zhiba Su, Siqi Sun

While recent Zero-Shot Text-to-Speech (ZS-TTS) models have achieved high naturalness and speaker similarity, they fall short in accent fidelity and control.

Text to Speech

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

1 code implementation11 Sep 2023 Jinzuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu

In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech.

Text to Speech

Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks

1 code implementation22 Sep 2022 Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter

While previous work has focused on predicting listeners' ratings (mean opinion scores) of individual stimuli, we focus on the simpler task of predicting subjective preference given two speech stimuli for the same text.

Automatic audiovisual synchronisation for ultrasound tongue imaging

no code implementations31 May 2021 Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals

Our results demonstrate the strength of our approach and its ability to generalise to data from new domains.

Silent versus modal multi-speaker speech recognition from ultrasound and video

no code implementations27 Feb 2021 Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing.

speech-recognition Speech Recognition

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors

no code implementations27 Feb 2021 Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals

For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio.

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

1 code implementation1 Jul 2019 Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James Scobbie, Alan Wrench

In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.

Synchronising audio and ultrasound by learning cross-modal embeddings

1 code implementation1 Jul 2019 Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals

Audiovisual synchronisation is the task of determining the time offset between speech audio and a video recording of the articulators.

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

1 code implementation1 Jul 2019 Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.

speaker-diarization Speaker Diarization +1

Speaker-independent classification of phonetic segments from raw ultrasound in child speech

no code implementations1 Jul 2019 Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production.

General Classification

Attentive Filtering Networks for Audio Replay Attack Detection

1 code implementation31 Oct 2018 Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King

In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.

Speaker Verification

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

2 code implementations15 Dec 2016 Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond

We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately.

Image Denoising Image Segmentation +1

A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract

no code implementations4 Sep 2015 Alexander Hewer, Ingmar Steiner, Timo Bolkart, Stefanie Wuhrer, Korin Richmond

The palate model is then tested using 3D MRI from another corpus and evaluated using a high-resolution optical scan.

Cannot find the paper you are looking for? You can Submit a new open access paper.