We formulated non-speech vocalization (NSV) modeling as a text-to-speech task and verified its viability.
This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers.
Robust speaker recognition, including in the presence of malicious attacks, is becoming increasingly important and essential, especially due to the proliferation of several smart speakers and personal agents that interact with an individual's voice commands to perform diverse, and even sensitive tasks.
Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios.
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.
In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task.