We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio.
In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.
In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement.
In this work, we propose Exformer, a time-domain architecture for target speaker extraction.
In this paper, we work on a sound recognition system that continually incorporates new sound classes.
Singing voice separation aims to separate music into vocals and accompaniment components.
We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients.
Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.
Ranked #3 on Speech Separation on WHAMR!
Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance.
Gradient-based planners are widely used for quadrotor local planning, in which a Euclidean Signed Distance Field (ESDF) is crucial for evaluating gradient magnitude and direction.
In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.
Ranked #9 on Speech Separation on WHAMR!
In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.
Ranked #21 on Speech Separation on WSJ0-2mix
We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data.