We present EdiTTS, an off-the-shelf speech editing methodology based on score-based generative modeling for text-to-speech synthesis.
In data-parallel training, we reorder the gradient computations to maximize the overlapping of computation and parameter communication; in pipeline-parallel training, we prioritize critical gradient computations to reduce the pipeline stalls. We evaluate our optimizations with twelve neural networks including a light-weight computer vision model (MobileNet) and largeNLP models (BERT and GPT-3) with up to forty eight V100 GPUs. Our scheduling algorithms effectively improve the performance of single-GPU training as well as data- and pipeline-parallel training. Compared to the respective state of the art training systems, the throughput is substantially improved for single-GPU, data-parallel, and pipeline-parallel training.
Although neural text-to-speech (TTS) models have attracted a lot of attention and succeeded in generating human-like speech, there is still room for improvements to its naturalness and architectural efficiency.
Photoplethysmogram (PPG) signal-based blood pressure (BP) estimation is a promising candidate for modern BP measurements, as PPG signals can be easily obtained from wearable devices in a non-invasive manner, allowing quick BP measurement.
In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time.
Flow-based generative models are composed of invertible transformations between two random variables of the same dimension.
Ranked #1 on Point Cloud Generation on ShapeNet Airplane