Search Results for author: Chung-Ming Chien

Found 9 papers, 5 papers with code

Toward Joint Language Modeling for Speech Units and Text

no code implementations • 12 Oct 2023 • Ju-chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli

However, in the field of language modeling, very little effort has been made to model them jointly.

Language Modelling Spoken Language Understanding

Paper
Add Code

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

no code implementations • 9 Oct 2023 • Chung-Ming Chien, Mingjiamei Zhang, Ju-chieh Chou, Karen Livescu

Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

no code implementations • 14 Sep 2023 • Ju-chieh Chou, Chung-Ming Chien, Karen Livescu

In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data.

Resynthesis Speech Enhancement

Paper
Add Code

What Do Self-Supervised Speech Models Know About Words?

1 code implementation • 30 Jun 2023 • Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu

Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks.

Sentence Sentence Similarity +1

Paper
Code

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

no code implementations • 16 Feb 2022 • Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.

Speech Synthesis Voice Conversion

Paper
Add Code

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

3 code implementations • 7 Apr 2021 • Jheng-Hao Lin, Yist Y. Lin, Chung-Ming Chien, Hung-Yi Lee

AUTOVC used dvector to extract speaker information, and self-supervised learning (SSL) features like wav2vec 2. 0 is used in FragmentVC to extract the phonetic content information.

Self-Supervised Learning Voice Conversion

2,090

Paper
Code

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

1 code implementation • 6 Mar 2021 • Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, Po-chun Hsu, Hung-Yi Lee

The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with voice and speaking style similar to a reference speaker given only a few reference samples.

Voice Cloning Voice Conversion

1,609

Paper
Code

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

1 code implementation • 12 Nov 2020 • Chung-Ming Chien, Hung-Yi Lee

Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks.

Speech Synthesis

1,609

Paper
Code

FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention

2 code implementations • 27 Oct 2020 • Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-Yi Lee, Lin-shan Lee

Any-to-any voice conversion aims to convert the voice from and to any speakers even unseen during training, which is much more challenging compared to one-to-one or many-to-many tasks, but much more attractive in real-world scenarios.

Disentanglement Speaker Verification +1

194

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.