no code implementations • 26 Nov 2024 • Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon
MultiFoley also allows users to choose reference audio from sound effects (SFX) libraries or partial videos for conditioning.
no code implementations • 14 Oct 2024 • Patrick O'Reilly, Prem Seetharaman, Jiaqi Su, Zeyu Jin, Bryan Pardo
Neural codecs have demonstrated strong performance in high-fidelity compression of audio signals at low bitrates.
1 code implementation • 10 Jul 2023 • Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation.
4 code implementations • NeurIPS 2023 • Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar
Language models have been successfully used to model natural signals, such as images, speech, and music.
no code implementations • 26 Aug 2022 • Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo
Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics.
1 code implementation • 25 Oct 2021 • Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo
We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining.
1 code implementation • 21 Oct 2021 • Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello
We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP).
1 code implementation • ICLR 2022 • Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio
We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.
no code implementations • 2 Nov 2020 • Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey
We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.
no code implementations • 23 Oct 2020 • Andreas Bugler, Bryan Pardo, Prem Seetharaman
Supervised deep learning methods for performing audio source separation can be very effective in domains where there is a large amount of training data.
1 code implementation • 25 Jul 2020 • Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux
Clipping the gradient is a known approach to improving gradient descent, but requires hand selection of a clipping threshold hyperparameter.
1 code implementation • 12 Jul 2020 • Omkar Ranadive, Grant Gasser, David Terpay, Prem Seetharaman
The agent receives a reward for turning off a source.
1 code implementation • 23 Jun 2020 • Alisa Liu, Alexander Fang, Gaëtan Hadjeres, Prem Seetharaman, Bryan Pardo
In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain.
1 code implementation • 23 Jun 2020 • Alexander Fang, Alisa Liu, Prem Seetharaman, Bryan Pardo
Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems.
no code implementations • 23 Oct 2019 • Alisa Liu, Prem Seetharaman, Bryan Pardo
We compare our confidence-based ensemble approach to using individual models with no selection, to an oracle that always selects the best model and to a random model selector.
no code implementations • 23 Oct 2019 • Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo
They are trained on synthetic mixtures of audio made from isolated sound source recordings so that ground truth for the separation is known.
no code implementations • 22 Oct 2019 • Ethan Manilow, Prem Seetharaman, Bryan Pardo
We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks.
no code implementations • 18 Sep 2019 • Ethan Manilow, Gordon Wichern, Prem Seetharaman, Jonathan Le Roux
In this paper, we present the synthesized Lakh dataset (Slakh) as a new tool for music source separation research.
no code implementations • 7 Nov 2018 • Prem Seetharaman, Gordon Wichern, Shrikant Venkataramani, Jonathan Le Roux
Isolating individual instruments in a musical mixture has a myriad of potential applications, and seems imminently achievable given the levels of performance reached by recent deep learning methods.
no code implementations • 6 Nov 2018 • Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo
These estimates, together with a weighting scheme in the time-frequency domain, based on confidence in the separation quality, are used to train a deep learning model that can be used for single-channel separation, where no source direction information is available.
1 code implementation • International Society for Music Information Retrieval Conference 2018 • Julia Wilkins, Prem Seetharaman, Alison Wahl, Bryan Pardo
We present VocalSet, a singing voice dataset of a capella singing.