Search Results for author: Chung-Ming Chien

Found 9 papers, 5 papers with code

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

no code implementations9 Oct 2023 Chung-Ming Chien, Mingjiamei Zhang, Ju-chieh Chou, Karen Livescu

Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space.

named-entity-recognition Named Entity Recognition +2

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

no code implementations14 Sep 2023 Ju-chieh Chou, Chung-Ming Chien, Karen Livescu

In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data.

Resynthesis Speech Enhancement

What Do Self-Supervised Speech Models Know About Words?

1 code implementation30 Jun 2023 Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu

Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks.

Sentence Sentence Similarity +1

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

no code implementations16 Feb 2022 Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.

Speech Synthesis Voice Conversion

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

3 code implementations7 Apr 2021 Jheng-Hao Lin, Yist Y. Lin, Chung-Ming Chien, Hung-Yi Lee

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

1 code implementation6 Mar 2021 Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, Po-chun Hsu, Hung-Yi Lee

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

Voice Cloning Voice Conversion

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

1 code implementation12 Nov 2020 Chung-Ming Chien, Hung-Yi Lee

Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks.

Speech Synthesis

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention

2 code implementations27 Oct 2020 Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-Yi Lee, Lin-shan Lee

Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios.

Disentanglement Speaker Verification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.