Search Results for author: Yi-Chiao Wu

Found 28 papers, 15 papers with code

ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter

no code implementations • 22 Jan 2024 • Yi-Chiao Wu, Dejan Marković, Steven Krenn, Israel D. Gebru, Alexander Richard

Although recent mainstream waveform-domain end-to-end (E2E) neural audio codecs achieve impressive coded audio quality with a very low bitrate, the quality gap between the coded and natural audio is still significant.

Generative Adversarial Network

Paper
Add Code

Audiobox: Unified Audio Generation with Natural Language Prompts

no code implementations • 25 Dec 2023 • Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.

Ranked #1 on Audio Generation on AudioCaps

AudioCaps Audio Generation +1

Paper
Add Code

AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

2 code implementations • 26 May 2023 • Yi-Chiao Wu, Israel D. Gebru, Dejan Marković, Alexander Richard

A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i. e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i. e.\ encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal.

332

Paper
Code

Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder

no code implementations • 27 Oct 2022 • Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

Our previous work, the unified source-filter GAN (uSFGAN) vocoder, introduced a novel architecture based on the source-filter theory into the parallel waveform generative adversarial network to achieve high voice quality and pitch controllability.

Generative Adversarial Network

Paper
Add Code

A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System

no code implementations • 13 Jul 2022 • Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda

Neural-based text-to-speech (TTS) systems achieve very high-fidelity speech generation because of the rapid neural network developments.

Paper
Add Code

Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

no code implementations • 12 May 2022 • Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

To improve the source excitation modeling and generated sound quality, a new source excitation generation network separately generating periodic and aperiodic components is proposed.

Paper
Add Code

HASA-net: A non-intrusive hearing-aid speech assessment network

no code implementations • 10 Nov 2021 • Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.

Paper
Add Code

Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder

no code implementations • 10 Jun 2021 • Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda

Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available.

Data Augmentation Speaker Verification

Paper
Add Code

Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN

1 code implementation • 10 Apr 2021 • Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda

We propose a unified approach to data-driven source-filter modeling using a single neural network for developing a neural vocoder capable of generating high-quality synthetic speech waveforms while retaining flexibility of the source-filter model to control their voice characteristics.

Paper
Code

The AS-NU System for the M2VoC Challenge

no code implementations • 7 Apr 2021 • Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang

The first track focuses on using a small number of 100 target utterances for voice cloning, while the second track focuses on using only 5 target utterances for voice cloning.

Voice Cloning

Paper
Add Code

crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

1 code implementation • 4 Mar 2021 • Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda

In this paper, we present an open-source software for developing a nonparallel voice conversion (VC) system named crank.

Voice Conversion

166

Paper
Code

Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations

no code implementations • 23 Oct 2020 • Wen-Chin Huang, Yi-Chiao Wu, Tomoki Hayashi, Tomoki Toda

Given a training dataset of the target speaker, we extract VQW2V and acoustic features to estimate a seq2seq mapping function from the former to the latter.

Voice Conversion

Paper
Add Code

The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders

no code implementations • 9 Oct 2020 • Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda

In this paper, we present the voice conversion (VC) systems developed at Nagoya University (NU) for the Voice Conversion Challenge 2020 (VCC2020).

Task 2 Voice Conversion

Paper
Add Code

Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN

1 code implementation • 9 Oct 2020 • Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Toda

In this paper, we present a description of the baseline system of Voice Conversion Challenge (VCC) 2020 with a cyclic variational autoencoder (CycleVAE) and Parallel WaveGAN (PWG), i. e., CycleVAEPWG.

Generative Adversarial Network Task 2 +1

131

Paper
Code

Pretraining Techniques for Sequence-to-Sequence Voice Conversion

1 code implementation • 7 Aug 2020 • Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda

Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Quasi-Periodic Parallel WaveGAN: A Non-autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

1 code implementation • 25 Jul 2020 • Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda

To improve the pitch controllability and speech modeling capability, we apply a QP structure with PDCNNs to PWG, which introduces pitch information to the network by dynamically changing the network architecture corresponding to the auxiliary $F_{0}$ feature.

Paper
Code

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

1 code implementation • 11 Jul 2020 • Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the limited pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs).

Paper
Code

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation

1 code implementation • 18 May 2020 • Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda

In this paper, we propose a parallel WaveGAN (PWG)-like neural vocoder with a quasi-periodic (QP) architecture to improve the pitch controllability of PWG.

Audio and Speech Processing Sound

Paper
Code

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

1 code implementation • 14 Dec 2019 • Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based on the Transformer architecture with text-to-speech (TTS) pretraining.

Voice Conversion

Paper
Code

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling

Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.

Person Recognition Speaker Verification +2

Paper
Add Code

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

2 code implementations • 24 Jul 2019 • Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

In this work, to overcome this problem, we propose to use CycleVAE-based spectral model that indirectly optimizes the conversion flow by recycling the converted features back into the system to obtain corresponding cyclic reconstructed spectra that can be directly optimized.

Voice Conversion

Paper
Code

Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder

1 code implementation • 21 Jul 2019 • Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

However, because of the fixed dilated convolution and generic network architecture, the WN vocoder lacks robustness against unseen input features and often requires a huge network size to achieve acceptable speech quality.

Audio and Speech Processing Sound

Paper
Code

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation

1 code implementation • 1 Jul 2019 • Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

In this paper, we propose a quasi-periodic neural network (QPNet) vocoder with a novel network architecture named pitch-dependent dilated convolution (PDCNN) to improve the pitch controllability of WaveNet (WN) vocoder.

Paper
Code

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

1 code implementation • 2 May 2019 • Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Such hypothesis implies that during the conversion phase, the latent codes and the converted features in VAE based VC are in fact source F0 dependent.

Disentanglement Voice Conversion

Paper
Code

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

no code implementations • 27 Nov 2018 • Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation.

Voice Conversion

Paper
Add Code

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

1 code implementation • 4 Apr 2017 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios.

Generative Adversarial Network Voice Conversion

145

Paper
Code

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

4 code implementations • 13 Oct 2016 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.

Voice Conversion

499

Paper
Code

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

no code implementations • 13 Oct 2016 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task.

Speech Enhancement Speech Synthesis +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.