In this work, we propose Exformer, a time-domain architecture for target speaker extraction.
In real life, room effect, also known as room reverberation, and the present background noise degrade the quality of speech.
Singing voice separation aims to separate music into vocals and accompaniment components.
The presence of multiple talkers in the surrounding environment poses a difficult challenge for real-time speech communication systems considering the constraints on network size and complexity.
Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance.
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output.
Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in training data.
Supervised deep learning has gained significant attention for speech enhancement recently.
Ranked #2 on Speech Enhancement on CHiME-3
We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing.
We propose a novel method called the Relevance Subject Machine (RSM) to solve the person re-identification (re-id) problem.
In this paper, we present a novel Bayesian approach to recover simultaneously block sparse signals in the presence of outliers.
We show that the proposed framework encompasses a large class of S-NNLS algorithms and provide a computationally efficient inference procedure based on multiplicative update rules.
In this paper, we propose a generalized scale mixture family of distributions, namely the Power Exponential Scale Mixture (PESM) family, to model the sparsity inducing priors currently in use for sparse signal recovery (SSR).