Search Results for author: Paris Smaragdis

Found 45 papers, 25 papers with code

Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations

1 code implementation • 5 Apr 2024 • Krishna Subramani, Paris Smaragdis, Takuya Higuchi, Mehrez Souden

Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i. e., data that can be stored in a matrix.

Paper
Code

Sound Source Separation Using Latent Variational Block-Wise Disentanglement

no code implementations • 8 Feb 2024 • Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin

In this paper, we present a hybrid classical digital signal processing/deep neural network (DSP/DNN) approach to source separation (SS) highlighting the theoretical link between variational autoencoder and classical approaches to SS.

Disentanglement

Paper
Add Code

Audio Editing with Non-Rigid Text Prompts

no code implementations • 19 Oct 2023 • Francesco Paissan, Zhepei Wang, Mirco Ravanelli, Paris Smaragdis, Cem Subakan

We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio.

Audio Generation Style Transfer

Paper
Add Code

Mechatronic Generation of Datasets for Acoustics Research

no code implementations • 1 Oct 2023 • Austin Lu, Ethaniel Moore, Arya Nallanthighall, Kanad Sarkar, Manan Mittal, Ryan M. Corey, Paris Smaragdis, Andrew Singer

We address the challenge of making spatial audio datasets by proposing a shared mechanized recording space that can run custom acoustic experiments: a Mechatronic Acoustic Research System (MARS).

Paper
Add Code

Noise-Robust DSP-Assisted Neural Pitch Estimation with Very Low Complexity

no code implementations • 25 Sep 2023 • Krishna Subramani, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Mike Goodwin

Pitch estimation is an essential step of many speech processing algorithms, including speech coding, synthesis, and enhancement.

Paper
Add Code

Complete and separate: Conditional separation with missing target source attribute completion

no code implementations • 27 Jul 2023 • Dimitrios Bralios, Efthymios Tzinis, Paris Smaragdis

Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance.

Attribute

Paper
Add Code

Unsupervised Improvement of Audio-Text Cross-Modal Representations

1 code implementation • 3 May 2023 • Zhepei Wang, Cem Subakan, Krishna Subramani, Junkai Wu, Tiago Tavares, Fabio Ayres, Paris Smaragdis

In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.

Acoustic Scene Classification Classification +2

Paper
Code

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

no code implementations • 23 Feb 2023 • Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis

In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement.

Multi-Task Learning Speech Enhancement

Paper
Add Code

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity

no code implementations • 8 Dec 2022 • Ahmed Mustafa, Jean-Marc Valin, Jan Büthe, Paris Smaragdis, Mike Goodwin

GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models.

Paper
Add Code

Latent Iterative Refinement for Modular Source Separation

1 code implementation • 22 Nov 2022 • Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module.

Paper
Code

Optimal Condition Training for Target Source Separation

1 code implementation • 11 Nov 2022 • Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries.

Paper
Code

Semi-supervised Time Domain Target Speaker Extraction with Attention

no code implementations • 18 Jun 2022 • Zhepei Wang, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Jean-Marc Valin, Paris Smaragdis, Mike Goodwin, Arvindh Krishnaswamy

In this work, we propose Exformer, a time-domain architecture for target speaker extraction.

Target Speaker Extraction

Paper
Add Code

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

1 code implementation • 15 May 2022 • Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis

In this paper, we work on a sound recognition system that continually incorporates new sound classes.

Continual Learning Representation Learning +1

Paper
Code

Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model

1 code implementation • 11 May 2022 • Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy

As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC).

Packet Loss Concealment Speech Synthesis

1,106

Paper
Code

Heterogeneous Target Speech Separation

no code implementations • 7 Apr 2022 • Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux

We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e. g., loudness, gender, language, spatial location, etc).

Speech Separation

Paper
Add Code

End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation

1 code implementation • 23 Feb 2022 • Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

Neural vocoders have recently demonstrated high quality speech synthesis, but typically require a high computational complexity.

Speech Synthesis

1,106

Paper
Code

Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet

2 code implementations • 22 Feb 2022 • Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so.

Speech Synthesis

1,106

Paper
Code

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

2 code implementations • 17 Feb 2022 • Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Ranked #4 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Speech Enhancement Unsupervised Domain Adaptation

Paper
Code

Differentiable Signal Processing With Black-Box Audio Effects

2 code implementations • 11 May 2021 • Marco A. Martínez Ramírez, Oliver Wang, Paris Smaragdis, Nicholas J. Bryan

We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network.

Audio Signal Processing

185

Paper
Code

Separate but Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data

1 code implementation • 11 May 2021 • Efthymios Tzinis, Jonah Casebeer, Zhepei Wang, Paris Smaragdis

We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients.

Federated Learning Speech Enhancement +1

Paper
Code

Point Cloud Audio Processing

1 code implementation • 6 May 2021 • Krishna Subramani, Paris Smaragdis

As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations.

BIG-bench Machine Learning

Paper
Code

Compute and memory efficient universal sound source separation

3 code implementations • 3 Mar 2021 • Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis

Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.

Ranked #4 on Speech Separation on WHAMR!

Audio Source Separation Efficient Neural Network +1

297

Paper
Code

Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

1 code implementation • 28 Oct 2020 • An Zhao, Krishna Subramani, Paris Smaragdis

The Short-Time Fourier Transform (STFT) has been a staple of signal processing, often being the first step for many audio tasks.

Paper
Code

Unified Gradient Reweighting for Model Biasing with Applications to Source Separation

1 code implementation • 25 Oct 2020 • Efthymios Tzinis, Dimitrios Bralios, Paris Smaragdis

In this paper, we propose a simple, unified gradient reweighting scheme, with a lightweight modification to bias the learning process of a model and steer it towards a certain distribution of results.

Audio Source Separation

Paper
Code

Sudo rm -rf: Efficient Networks for Universal Audio Source Separation

4 code implementations • 14 Jul 2020 • Efthymios Tzinis, Zhepei Wang, Paris Smaragdis

In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.

Ranked #10 on Speech Separation on WHAMR!

Audio Source Separation Efficient Neural Network +1

2,096

Paper
Code

Self-supervised Learning for Speech Enhancement

1 code implementation • 18 Jun 2020 • Yu-Che Wang, Shrikant Venkataramani, Paris Smaragdis

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target.

Audio and Speech Processing Sound

Paper
Code

Two-Step Sound Source Separation: Training on Learned Latent Targets

2 code implementations • 22 Oct 2019 • Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis

In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.

Ranked #25 on Speech Separation on WSJ0-2mix

Speech Separation Vocal Bursts Valence Prediction

2,096

Paper
Code

Continual Learning of New Sound Classes using Generative Replay

no code implementations • 3 Jun 2019 • Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis, Laurent Charlin

We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data.

Continual Learning Sound Classification

Paper
Add Code

Deep Tensor Factorization for Spatially-Aware Scene Decomposition

no code implementations • 3 May 2019 • Jonah Casebeer, Michael Colomb, Paris Smaragdis

We propose a completely unsupervised method to understand audio scenes observed with random microphone arrangements by decomposing the scene into its constituent sources and their relative presence in each microphone.

Clustering

Paper
Add Code

Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information

1 code implementation • 5 Nov 2018 • Efthymios Tzinis, Shrikant Venkataramani, Paris Smaragdis

We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information.

Clustering Deep Clustering +2

Paper
Code

End-to-end Networks for Supervised Single-channel Speech Separation

no code implementations • 5 Oct 2018 • Shrikant Venkataramani, Paris Smaragdis

The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks.

Speech Separation

Paper
Add Code

Learning the Base Distribution in Implicit Generative Models

no code implementations • 12 Mar 2018 • Cem Subakan, Oluwasanmi Koyejo, Paris Smaragdis

Popular generative model learning methods such as Generative Adversarial Networks (GANs), and Variational Autoencoders (VAE) enforce the latent representation to follow simple distributions such as isotropic Gaussian.

Paper
Add Code

Generative Adversarial Source Separation

1 code implementation • 30 Oct 2017 • Cem Subakan, Paris Smaragdis

Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density.

Paper
Code

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

1 code implementation • 15 Sep 2017 • Nasser Mohammadiha, Paris Smaragdis, Arne Leijon

We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF).

Denoising Speech Denoising +1

Paper
Code

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

no code implementations • 31 Aug 2017 • Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo

Nonnegative matrix factorization (NMF) has been actively investigated and used in a wide range of problems in the past decade.

Time Series Time Series Analysis

Paper
Add Code

End-to-end Source Separation with Adaptive Front-Ends

1 code implementation • 6 May 2017 • Shrikant Venkataramani, Jonah Casebeer, Paris Smaragdis

We present an auto-encoder neural network that can act as an equivalent to short-time front-end transforms.

Sound

Paper
Code

Diagonal RNNs in Symbolic Music Modeling

1 code implementation • 18 Apr 2017 • Y. Cem Subakan, Paris Smaragdis

In this paper, we propose a new Recurrent Neural Network (RNN) architecture.

Music Modeling

Paper
Code

NoiseOut: A Simple Way to Prune Neural Networks

no code implementations • 18 Nov 2016 • Mohammad Babaeizadeh, Paris Smaragdis, Roy H. Campbell

In this paper, we propose NoiseOut, a fully automated pruning algorithm based on the correlation between activations of neurons in the hidden layers.

Paper
Add Code

Bitwise Neural Networks

no code implementations • 22 Jan 2016 • Minje Kim, Paris Smaragdis

Based on the assumption that there exists a neural network that efficiently represents a set of Boolean functions between all binary inputs and outputs, we propose a process for developing and deploying neural networks whose weight parameters, bias terms, input, and intermediate hidden layer output signals, are all binary-valued, and require only basic bit logic for the feedforward pass.

Paper
Add Code

A Dictionary Learning Approach for Factorial Gaussian Models

no code implementations • 18 Aug 2015 • Y. Cem Subakan, Johannes Traa, Paris Smaragdis, Noah Stein

We argue that due to the specific structure of the activation matrix $R$ in the shared component factorial mixture model, and an incoherence assumption on the shared component, it is possible to extract the columns of the $O$ matrix without the need for alternating between the estimation of $O$ and $R$.

Dictionary Learning

Paper
Add Code

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

2 code implementations • 13 Feb 2015 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising.

Denoising Speech Denoising +1

151

Paper
Code

Spectral Learning of Mixture of Hidden Markov Models

no code implementations • NeurIPS 2014 • Cem Subakan, Johannes Traa, Paris Smaragdis

In this paper, we propose a learning approach for the Mixture of Hidden Markov Models (MHMM) based on the Method of Moments (MoM).

Paper
Add Code

Deep learning for monaural speech separation

1 code implementation • ICASSP 2014 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

In this paper, we study deep learning for monaural speech separation.

Multi-Speaker Source Separation Speech Separation

364

Paper
Code

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

no code implementations • NeurIPS 2009 • Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

In this paper we present an algorithm for separating mixed sounds from a monophonic recording.

Paper
Add Code

Sparse Overcomplete Latent Variable Decomposition of Counts Data

no code implementations • NeurIPS 2007 • Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis

An important problem in many fields is the analysis of counts data to extract meaningful latent components.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.