Search Results for author: Paris Smaragdis

Found 45 papers, 25 papers with code

Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations

1 code implementation5 Apr 2024 Krishna Subramani, Paris Smaragdis, Takuya Higuchi, Mehrez Souden

Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i. e., data that can be stored in a matrix.

Sound Source Separation Using Latent Variational Block-Wise Disentanglement

no code implementations8 Feb 2024 Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin

In this paper, we present a hybrid classical digital signal processing/deep neural network (DSP/DNN) approach to source separation (SS) highlighting the theoretical link between variational autoencoder and classical approaches to SS.

Disentanglement

Audio Editing with Non-Rigid Text Prompts

no code implementations19 Oct 2023 Francesco Paissan, Zhepei Wang, Mirco Ravanelli, Paris Smaragdis, Cem Subakan

We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio.

Audio Generation Style Transfer

Mechatronic Generation of Datasets for Acoustics Research

no code implementations1 Oct 2023 Austin Lu, Ethaniel Moore, Arya Nallanthighall, Kanad Sarkar, Manan Mittal, Ryan M. Corey, Paris Smaragdis, Andrew Singer

We address the challenge of making spatial audio datasets by proposing a shared mechanized recording space that can run custom acoustic experiments: a Mechatronic Acoustic Research System (MARS).

Noise-Robust DSP-Assisted Neural Pitch Estimation with Very Low Complexity

no code implementations25 Sep 2023 Krishna Subramani, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Mike Goodwin

Pitch estimation is an essential step of many speech processing algorithms, including speech coding, synthesis, and enhancement.

Complete and separate: Conditional separation with missing target source attribute completion

no code implementations27 Jul 2023 Dimitrios Bralios, Efthymios Tzinis, Paris Smaragdis

Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance.

Attribute

Unsupervised Improvement of Audio-Text Cross-Modal Representations

1 code implementation3 May 2023 Zhepei Wang, Cem Subakan, Krishna Subramani, Junkai Wu, Tiago Tavares, Fabio Ayres, Paris Smaragdis

In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.

Acoustic Scene Classification Classification +2

A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

no code implementations23 Feb 2023 Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis

In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement.

Multi-Task Learning Speech Enhancement

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity

no code implementations8 Dec 2022 Ahmed Mustafa, Jean-Marc Valin, Jan Büthe, Paris Smaragdis, Mike Goodwin

GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural waveform generative models.

Latent Iterative Refinement for Modular Source Separation

1 code implementation22 Nov 2022 Dimitrios Bralios, Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

During inference, we can dynamically adjust how many processing blocks and iterations of a specific block an input signal needs using a gating module.

Optimal Condition Training for Target Source Separation

1 code implementation11 Nov 2022 Efthymios Tzinis, Gordon Wichern, Paris Smaragdis, Jonathan Le Roux

Recent research has shown remarkable performance in leveraging multiple extraneous conditional and non-mutually exclusive semantic concepts for sound source separation, allowing the flexibility to extract a given target source based on multiple different queries.

Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model

1 code implementation11 May 2022 Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy

As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC).

Packet Loss Concealment Speech Synthesis

Heterogeneous Target Speech Separation

no code implementations7 Apr 2022 Efthymios Tzinis, Gordon Wichern, Aswin Subramanian, Paris Smaragdis, Jonathan Le Roux

We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e. g., loudness, gender, language, spatial location, etc).

Speech Separation

End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation

1 code implementation23 Feb 2022 Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

Neural vocoders have recently demonstrated high quality speech synthesis, but typically require a high computational complexity.

Speech Synthesis

Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet

2 code implementations22 Feb 2022 Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy

Neural speech synthesis models can synthesize high quality speech but typically require a high computational complexity to do so.

Speech Synthesis

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

2 code implementations17 Feb 2022 Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures.

Speech Enhancement Unsupervised Domain Adaptation

Differentiable Signal Processing With Black-Box Audio Effects

2 code implementations11 May 2021 Marco A. Martínez Ramírez, Oliver Wang, Paris Smaragdis, Nicholas J. Bryan

We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network.

Audio Signal Processing

Separate but Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data

1 code implementation11 May 2021 Efthymios Tzinis, Jonah Casebeer, Zhepei Wang, Paris Smaragdis

We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients.

Federated Learning Speech Enhancement +1

Point Cloud Audio Processing

1 code implementation6 May 2021 Krishna Subramani, Paris Smaragdis

As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations.

BIG-bench Machine Learning

Compute and memory efficient universal sound source separation

3 code implementations3 Mar 2021 Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis

Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.

Audio Source Separation Efficient Neural Network +1

Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

1 code implementation28 Oct 2020 An Zhao, Krishna Subramani, Paris Smaragdis

The Short-Time Fourier Transform (STFT) has been a staple of signal processing, often being the first step for many audio tasks.

Unified Gradient Reweighting for Model Biasing with Applications to Source Separation

1 code implementation25 Oct 2020 Efthymios Tzinis, Dimitrios Bralios, Paris Smaragdis

In this paper, we propose a simple, unified gradient reweighting scheme, with a lightweight modification to bias the learning process of a model and steer it towards a certain distribution of results.

Audio Source Separation

Self-supervised Learning for Speech Enhancement

1 code implementation18 Jun 2020 Yu-Che Wang, Shrikant Venkataramani, Paris Smaragdis

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target.

Audio and Speech Processing Sound

Two-Step Sound Source Separation: Training on Learned Latent Targets

2 code implementations22 Oct 2019 Efthymios Tzinis, Shrikant Venkataramani, Zhepei Wang, Cem Subakan, Paris Smaragdis

In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal.

Speech Separation Vocal Bursts Valence Prediction

Continual Learning of New Sound Classes using Generative Replay

no code implementations3 Jun 2019 Zhepei Wang, Cem Subakan, Efthymios Tzinis, Paris Smaragdis, Laurent Charlin

We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data.

Continual Learning Sound Classification

Deep Tensor Factorization for Spatially-Aware Scene Decomposition

no code implementations3 May 2019 Jonah Casebeer, Michael Colomb, Paris Smaragdis

We propose a completely unsupervised method to understand audio scenes observed with random microphone arrangements by decomposing the scene into its constituent sources and their relative presence in each microphone.

Clustering

Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures using Spatial Information

1 code implementation5 Nov 2018 Efthymios Tzinis, Shrikant Venkataramani, Paris Smaragdis

We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information.

Clustering Deep Clustering +2

End-to-end Networks for Supervised Single-channel Speech Separation

no code implementations5 Oct 2018 Shrikant Venkataramani, Paris Smaragdis

The performance of single channel source separation algorithms has improved greatly in recent times with the development and deployment of neural networks.

Speech Separation

Learning the Base Distribution in Implicit Generative Models

no code implementations12 Mar 2018 Cem Subakan, Oluwasanmi Koyejo, Paris Smaragdis

Popular generative model learning methods such as Generative Adversarial Networks (GANs), and Variational Autoencoders (VAE) enforce the latent representation to follow simple distributions such as isotropic Gaussian.

Generative Adversarial Source Separation

1 code implementation30 Oct 2017 Cem Subakan, Paris Smaragdis

Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density.

A State-Space Approach to Dynamic Nonnegative Matrix Factorization

no code implementations31 Aug 2017 Nasser Mohammadiha, Paris Smaragdis, Ghazaleh Panahandeh, Simon Doclo

Nonnegative matrix factorization (NMF) has been actively investigated and used in a wide range of problems in the past decade.

Time Series Time Series Analysis

End-to-end Source Separation with Adaptive Front-Ends

1 code implementation6 May 2017 Shrikant Venkataramani, Jonah Casebeer, Paris Smaragdis

We present an auto-encoder neural network that can act as an equivalent to short-time front-end transforms.

Sound

Diagonal RNNs in Symbolic Music Modeling

1 code implementation18 Apr 2017 Y. Cem Subakan, Paris Smaragdis

In this paper, we propose a new Recurrent Neural Network (RNN) architecture.

Music Modeling

NoiseOut: A Simple Way to Prune Neural Networks

no code implementations18 Nov 2016 Mohammad Babaeizadeh, Paris Smaragdis, Roy H. Campbell

In this paper, we propose NoiseOut, a fully automated pruning algorithm based on the correlation between activations of neurons in the hidden layers.

Bitwise Neural Networks

no code implementations22 Jan 2016 Minje Kim, Paris Smaragdis

Based on the assumption that there exists a neural network that efficiently represents a set of Boolean functions between all binary inputs and outputs, we propose a process for developing and deploying neural networks whose weight parameters, bias terms, input, and intermediate hidden layer output signals, are all binary-valued, and require only basic bit logic for the feedforward pass.

A Dictionary Learning Approach for Factorial Gaussian Models

no code implementations18 Aug 2015 Y. Cem Subakan, Johannes Traa, Paris Smaragdis, Noah Stein

We argue that due to the specific structure of the activation matrix $R$ in the shared component factorial mixture model, and an incoherence assumption on the shared component, it is possible to extract the columns of the $O$ matrix without the need for alternating between the estimation of $O$ and $R$.

Dictionary Learning

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

2 code implementations13 Feb 2015 Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising.

Denoising Speech Denoising +1

Spectral Learning of Mixture of Hidden Markov Models

no code implementations NeurIPS 2014 Cem Subakan, Johannes Traa, Paris Smaragdis

In this paper, we propose a learning approach for the Mixture of Hidden Markov Models (MHMM) based on the Method of Moments (MoM).

Sparse Overcomplete Latent Variable Decomposition of Counts Data

no code implementations NeurIPS 2007 Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis

An important problem in many fields is the analysis of counts data to extract meaningful latent components.

Cannot find the paper you are looking for? You can Submit a new open access paper.