Search Results for author: Erica Cooper

Found 21 papers, 11 papers with code

Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

no code implementations25 Dec 2023 Aditya Ravuri, Erica Cooper, Junichi Yamagishi

Predicting audio quality in voice synthesis and conversion systems is a critical yet challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are cumbersome to collect at scale.

Self-Supervised Learning

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

no code implementations4 Oct 2023 Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech.

Speech Synthesis Text-To-Speech Synthesis

Range-Based Equal Error Rate for Spoof Localization

1 code implementation28 May 2023 Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

To properly measure misclassified ranges and better evaluate spoof localization performance, we upgrade point-based EER to range-based EER.

Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech

no code implementations17 May 2023 Erica Cooper, Junichi Yamagishi

Mean Opinion Score (MOS) is a popular measure for evaluating synthesized speech.

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

no code implementations1 Sep 2022 Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi

Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring.

Data Augmentation Speaker Verification

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance

no code implementations11 Apr 2022 Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

Since the short spoofed speech segments to be embedded by attackers are of variable length, six different temporal resolutions are considered, ranging from as short as 20 ms to as large as 640 ms. Third, we propose a new CM that enables the simultaneous use of the segment-level labels at different temporal resolutions as well as utterance-level labels to execute utterance- and segment-level detection at the same time.

Speaker Verification Speech Synthesis +2

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

1 code implementation18 Oct 2021 Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda

An effective approach to automatically predict the subjective rating for synthetic speech is to train on a listening test dataset with human-annotated scores.

Voice Conversion

Generalization Ability of MOS Prediction Networks

1 code implementation6 Oct 2021 Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi

Automatic methods to predict listener opinions of synthesized speech remain elusive since listeners, systems being evaluated, characteristics of the speech, and even the instructions given and the rating scale all vary from test to test.

Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds

1 code implementation24 Jul 2021 Xuan Shi, Erica Cooper, Junichi Yamagishi

Constructing an embedding space for musical instrument sounds that can meaningfully represent new and unseen instruments is important for downstream music generation tasks such as multi-instrument synthesis and timbre transfer.

Data Augmentation Instrument Recognition +4

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

1 code implementation4 May 2021 Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi

This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data.

Disentanglement

An Initial Investigation for Detecting Partially Spoofed Audio

no code implementations6 Apr 2021 Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi, Jose Patino, Nicholas Evans

By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with entirely spoofed utterances.

Voice Anti-spoofing

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

1 code implementation4 Apr 2021 Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities.

Speaker Verification

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

no code implementations10 Nov 2020 Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi

We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis.

Speech Synthesis

An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems

no code implementations21 Oct 2020 Antoine Perquin, Erica Cooper, Junichi Yamagishi

Thanks to this property, we show that grapheme embeddings learned by Tacotron models can be useful for tasks such as grapheme-to-phoneme conversion and control of the pronunciation in synthetic speech.

Relation Speech Synthesis +1

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm

1 code implementation21 Oct 2020 Jennifer Williams, Yi Zhao, Erica Cooper, Junichi Yamagishi

Additionally, phones can be recognized from sub-phone VQ codebook indices in our semi-supervised VQ-VAE better than self-supervised with global conditions.

speaker-diarization Speaker Diarization +1

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

1 code implementation4 May 2020 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi

This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.

Speech Synthesis

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations23 Oct 2019 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.