NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model.
Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.
Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available.
The first track focuses on using a small number of 100 target utterances for voice cloning, while the second track focuses on using only 5 target utterances for voice cloning.
This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2).