no code implementations • 25 Sep 2023 • Krishna Subramani, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Mike Goodwin
Pitch estimation is an essential step of many speech processing algorithms, including speech coding, synthesis, and enhancement.
no code implementations • 27 Jul 2023 • Dimitrios Bralios, Efthymios Tzinis, Paris Smaragdis
Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance.
1 code implementation • 3 May 2023 • Zhepei Wang, Cem Subakan, Krishna Subramani, Junkai Wu, Tiago Tavares, Fabio Ayres, Paris Smaragdis
In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.
no code implementations • 23 Feb 2023 • Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis
In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement.
no code implementations • 8 Dec 2022 • Ahmed Mustafa, Jean-Marc Valin, Jan Büthe, Paris Smaragdis, Mike Goodwin
GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models.
no code implementations • 22 Nov 2022 • Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux
During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module.
1 code implementation • 11 Nov 2022 • Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux
Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries.
no code implementations • 18 Jun 2022 • Zhepei Wang, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Jean-Marc Valin, Paris Smaragdis, Mike Goodwin, Arvindh Krishnaswamy
In this work, we propose Exformer, a time-domain architecture for target speaker extraction.
1 code implementation • 15 May 2022 • Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis
In this paper, we work on a sound recognition system that continually incorporates new sound classes.
1 code implementation • 11 May 2022 • Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy
As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC).
no code implementations • 7 Apr 2022 • Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux
We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e. g., loudness, gender, language, spatial location, etc).
1 code implementation • 23 Feb 2022 • Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
Neural vocoders have recently demonstrated high quality speech synthesis, but typically require a high computational complexity.
2 code implementations • 22 Feb 2022 • Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy
Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so.
1 code implementation • 17 Feb 2022 • Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar
RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.
2 code implementations • 11 May 2021 • Marco A. Martínez Ramírez, Oliver Wang, Paris Smaragdis, Nicholas J. Bryan
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network.
1 code implementation • 11 May 2021 • Efthymios Tzinis, Jonah Casebeer, Zhepei Wang, Paris Smaragdis
We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients.
1 code implementation • 6 May 2021 • Krishna Subramani, Paris Smaragdis
As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations.
2 code implementations • 3 Mar 2021 • Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis
Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.
Ranked #2 on
Speech Separation
on WHAMR!
1 code implementation • 28 Oct 2020 • An Zhao, Krishna Subramani, Paris Smaragdis
The Short-Time Fourier Transform (STFT) has been a staple of signal processing, often being the first step for many audio tasks.
1 code implementation • 25 Oct 2020 • Efthymios Tzinis, Dimitrios Bralios, Paris Smaragdis
In this paper, we propose a simple, unified gradient reweighting scheme, with a lightweight modification to bias the learning process of a model and steer it towards a certain distribution of results.
3 code implementations • 14 Jul 2020 • Efthymios Tzinis, Zhepei Wang, Paris Smaragdis
In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.
Ranked #7 on
Speech Separation
on WHAMR!
1 code implementation • 18 Jun 2020 • Yu-Che Wang, Shrikant Venkataramani, Paris Smaragdis
Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target.
Audio and Speech Processing Sound
2 code implementations • 22 Oct 2019 • Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis
In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.
Ranked #20 on
Speech Separation
on WSJ0-2mix
no code implementations • 3 Jun 2019 • Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis, Laurent Charlin
We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data.
no code implementations • 3 May 2019 • Jonah Casebeer, Michael Colomb, Paris Smaragdis
We propose a completely unsupervised method to understand audio scenes observed with random microphone arrangements by decomposing the scene into its constituent sources and their relative presence in each microphone.
1 code implementation • 5 Nov 2018 • Efthymios Tzinis, Shrikant Venkataramani, Paris Smaragdis
We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information.
no code implementations • 5 Oct 2018 • Shrikant Venkataramani, Paris Smaragdis
The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks.
no code implementations • 12 Mar 2018 • Cem Subakan, Oluwasanmi Koyejo, Paris Smaragdis
Popular generative model learning methods such as Generative Adversarial Networks (GANs), and Variational Autoencoders (VAE) enforce the latent representation to follow simple distributions such as isotropic Gaussian.
1 code implementation • 30 Oct 2017 • Cem Subakan, Paris Smaragdis
Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density.
1 code implementation • 15 Sep 2017 • Nasser Mohammadiha, Paris Smaragdis, Arne Leijon
We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF).
no code implementations • 31 Aug 2017 • Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo
Nonnegative matrix factorization (NMF) has been actively investigated and used in a wide range of problems in the past decade.
1 code implementation • 6 May 2017 • Shrikant Venkataramani, Jonah Casebeer, Paris Smaragdis
We present an auto-encoder neural network that can act as an equivalent to short-time front-end transforms.
Sound
1 code implementation • 18 Apr 2017 • Y. Cem Subakan, Paris Smaragdis
In this paper, we propose a new Recurrent Neural Network (RNN) architecture.
no code implementations • 18 Nov 2016 • Mohammad Babaeizadeh, Paris Smaragdis, Roy H. Campbell
In this paper, we propose NoiseOut, a fully automated pruning algorithm based on the correlation between activations of neurons in the hidden layers.
no code implementations • 22 Jan 2016 • Minje Kim, Paris Smaragdis
Based on the assumption that there exists a neural network that efficiently represents a set of Boolean functions between all binary inputs and outputs, we propose a process for developing and deploying neural networks whose weight parameters, bias terms, input, and intermediate hidden layer output signals, are all binary-valued, and require only basic bit logic for the feedforward pass.
no code implementations • 18 Aug 2015 • Y. Cem Subakan, Johannes Traa, Paris Smaragdis, Noah Stein
We argue that due to the specific structure of the activation matrix $R$ in the shared component factorial mixture model, and an incoherence assumption on the shared component, it is possible to extract the columns of the $O$ matrix without the need for alternating between the estimation of $O$ and $R$.
2 code implementations • 13 Feb 2015 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising.
no code implementations • NeurIPS 2014 • Cem Subakan, Johannes Traa, Paris Smaragdis
In this paper, we propose a learning approach for the Mixture of Hidden Markov Models (MHMM) based on the Method of Moments (MoM).
1 code implementation • ICASSP 2014 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
In this paper, we study deep learning for monaural speech separation.
no code implementations • NeurIPS 2009 • Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj
In this paper we present an algorithm for separating mixed sounds from a monophonic recording.
no code implementations • NeurIPS 2007 • Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis
An important problem in many fields is the analysis of counts data to extract meaningful latent components.