As the refiner, we train a diffusion-based generative model by utilizing a dataset consisting of clean speech only.
In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT).
In order to optimize the DNN-based SE model in terms of the character error rate (CER), which is one of the metric to evaluate the ASR system and generally non-differentiable, our method uses two DNNs: one for speech processing and one for mimicking the output CERs derived through an acoustic model (AM).
This paper presents a new deep clustering (DC) method called manifold-aware DC (M-DC) that can enhance hyperspace utilization more effectively than the original DC.
This paper proposes several improvements for music separation with deep neural networks (DNNs), namely a multi-domain loss (MDL) and two combination schemes.
Ranked #18 on Music Source Separation on MUSDB18