no code implementations • 27 Jul 2023 • Dimitrios Bralios, Efthymios Tzinis, Paris Smaragdis
Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance.
1 code implementation • 22 Nov 2022 • Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux
During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module.
1 code implementation • 11 Nov 2022 • Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux
Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries.
no code implementations • 20 Jul 2022 • Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey
We identify several limitations of previous work on audio-visual on-screen sound separation, including the coarse resolution of spatio-temporal attention, poor convergence of the audio separation model, limited variety in training and evaluation data, and failure to account for the trade off between preservation of on-screen sounds and suppression of off-screen sounds.
1 code implementation • 15 May 2022 • Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis
In this paper, we work on a sound recognition system that continually incorporates new sound classes.
no code implementations • 7 Apr 2022 • Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux
We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e. g., loudness, gender, language, spatial location, etc).
2 code implementations • 17 Feb 2022 • Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar
RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.
1 code implementation • 19 Oct 2021 • Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar
Specifically, a separation teacher model is pre-trained on an out-of-domain dataset and is used to infer estimated target signals for a batch of in-domain mixtures.
no code implementations • 17 Jun 2021 • Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey
We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos.
1 code implementation • 11 May 2021 • Efthymios Tzinis, Jonah Casebeer, Zhepei Wang, Paris Smaragdis
We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients.
no code implementations • 14 Apr 2021 • Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos
We examine the use of linear and non-linear dimensionality reduction algorithms for extracting low-rank feature representations for speech emotion recognition.
3 code implementations • 3 Mar 2021 • Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis
Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.
Ranked #5 on Speech Separation on WHAMR!
no code implementations • ICLR 2021 • Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey
For evaluation and semi-supervised experiments, we collected human labels for presence of on-screen and off-screen sounds on a small subset of clips.
1 code implementation • 25 Oct 2020 • Efthymios Tzinis, Dimitrios Bralios, Paris Smaragdis
In this paper, we propose a simple, unified gradient reweighting scheme, with a lightweight modification to bias the learning process of a model and steer it towards a certain distribution of results.
4 code implementations • 14 Jul 2020 • Efthymios Tzinis, Zhepei Wang, Paris Smaragdis
In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.
Ranked #11 on Speech Separation on WHAMR!
no code implementations • NeurIPS 2020 • Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, John R. Hershey
In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources.
no code implementations • 18 Nov 2019 • Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis
Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification.
2 code implementations • 22 Oct 2019 • Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis
In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.
Ranked #26 on Speech Separation on WSJ0-2mix
no code implementations • 3 Jun 2019 • Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis, Laurent Charlin
We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data.
1 code implementation • 4 Feb 2019 • Efthymios Tzinis
The backbone element of CSMDS framework is the corresponding probability matrix that correspond to how likely is each corresponding coordinate to be evaluated.
1 code implementation • 9 Nov 2018 • Efthymios Tzinis, Georgios Paraskevopoulos, Christos Baziotis, Alexandros Potamianos
We investigate the performance of features that can capture nonlinear recurrence dynamics embedded in the speech signal for the task of Speech Emotion Recognition (SER).
Ranked #49 on Emotion Recognition in Conversation on IEMOCAP
Emotion Recognition in Conversation Speech Emotion Recognition
1 code implementation • 5 Nov 2018 • Efthymios Tzinis, Shrikant Venkataramani, Paris Smaragdis
We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information.
1 code implementation • 1 Jun 2018 • Georgios Paraskevopoulos, Efthymios Tzinis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Alexandros Potamianos
We present a novel view of nonlinear manifold learning using derivative-free optimization techniques.