Search Results for author: Prem Seetharaman

Found 21 papers, 10 papers with code

Video-Guided Foley Sound Generation with Multimodal Controls

no code implementations26 Nov 2024 Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon

MultiFoley also allows users to choose reference audio from sound effects (SFX) libraries or partial videos for conditioning.

Audio Generation

Code Drift: Towards Idempotent Neural Audio Codecs

no code implementations14 Oct 2024 Patrick O'Reilly, Prem Seetharaman, Jiaqi Su, Zeyu Jin, Bryan Pardo

Neural codecs have demonstrated strong performance in high-fidelity compression of audio signals at low bitrates.

VampNet: Music Generation via Masked Acoustic Token Modeling

1 code implementation10 Jul 2023 Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo

We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation.

Music Compression Music Generation

Music Separation Enhancement with Generative Modeling

no code implementations26 Aug 2022 Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo

Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics.

Music Source Separation

Unsupervised Source Separation By Steering Pretrained Music Models

1 code implementation25 Oct 2021 Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo

We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining.

Audio Generation Audio Source Separation +3

Wav2CLIP: Learning Robust Audio Representations From CLIP

1 code implementation21 Oct 2021 Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP).

Cross-Modal Retrieval Image Generation +3

Chunked Autoregressive GAN for Conditional Waveform Synthesis

1 code implementation ICLR 2022 Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.

Inductive Bias

What's All the FUSS About Free Universal Sound Separation Data?

no code implementations2 Nov 2020 Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.

Data Augmentation

A Study of Transfer Learning in Music Source Separation

no code implementations23 Oct 2020 Andreas Bugler, Bryan Pardo, Prem Seetharaman

Supervised deep learning methods for performing audio source separation can be very effective in domains where there is a large amount of training data.

Audio Source Separation Data Augmentation +3

AutoClip: Adaptive Gradient Clipping for Source Separation Networks

1 code implementation25 Jul 2020 Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux

Clipping the gradient is a known approach to improving gradient descent, but requires hand selection of a clipping threshold hyperparameter.

Audio Source Separation

Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation

1 code implementation23 Jun 2020 Alisa Liu, Alexander Fang, Gaëtan Hadjeres, Prem Seetharaman, Bryan Pardo

In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain.

Music Generation

Bach or Mock? A Grading Function for Chorales in the Style of J.S. Bach

1 code implementation23 Jun 2020 Alexander Fang, Alisa Liu, Prem Seetharaman, Bryan Pardo

Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems.

Model selection for deep audio source separation via clustering analysis

no code implementations23 Oct 2019 Alisa Liu, Prem Seetharaman, Bryan Pardo

We compare our confidence-based ensemble approach to using individual models with no selection, to an oracle that always selects the best model and to a random model selector.

Audio Source Separation Clustering +1

Bootstrapping deep music separation from primitive auditory grouping principles

no code implementations23 Oct 2019 Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo

They are trained on synthetic mixtures of audio made from isolated sound source recordings so that ground truth for the separation is known.

Music Source Separation

Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments

no code implementations22 Oct 2019 Ethan Manilow, Prem Seetharaman, Bryan Pardo

We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks.

Class-conditional embeddings for music source separation

no code implementations7 Nov 2018 Prem Seetharaman, Gordon Wichern, Shrikant Venkataramani, Jonathan Le Roux

Isolating individual instruments in a musical mixture has a myriad of potential applications, and seems imminently achievable given the levels of performance reached by recent deep learning methods.

Clustering Deep Clustering +1

Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures

no code implementations6 Nov 2018 Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo

These estimates, together with a weighting scheme in the time-frequency domain, based on confidence in the separation quality, are used to train a deep learning model that can be used for single-channel separation, where no source direction information is available.

Clustering Image Segmentation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.