no code implementations • 4 Mar 2025 • Christodoulos Benetatos, Frank Cwitkowitz, Nathan Pruyne, Hugo Flores Garcia, Patrick O'Reilly, Zhiyao Duan, Bryan Pardo
HARP 2. 0 brings deep learning models to digital audio workstation (DAW) software through hosted, asynchronous, remote processing, allowing users to route audio from a plug-in interface through any compatible Gradio endpoint to perform arbitrary transformations.
no code implementations • 14 Oct 2024 • Patrick O'Reilly, Prem Seetharaman, Jiaqi Su, Zeyu Jin, Bryan Pardo
Neural codecs have demonstrated strong performance in high-fidelity compression of audio signals at low bitrates.
no code implementations • 27 Sep 2024 • Annie Chu, Patrick O'Reilly, Julia Barnett, Bryan Pardo
This work introduces Text2FX, a method that leverages CLAP embeddings and differentiable digital signal processing to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e. g., "make this sound in-your-face and bold").
1 code implementation • 7 Jul 2024 • Max Morrison, Cameron Churchwell, Nathan Pruyne, Bryan Pardo
Fine-grained editing of speech attributes$\unicode{x2014}$such as prosody (i. e., the pitch, loudness, and phoneme durations), pronunciation, speaker identity, and formants$\unicode{x2014}$is useful for fine-tuning and fixing imperfections in human and AI-generated speech recordings for creation of podcasts, film dialogue, and video game dialogue.
1 code implementation • 27 Feb 2024 • Cameron Churchwell, Max Morrison, Bryan Pardo
A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e. g., phonemes).
no code implementations • 25 Jan 2024 • Julia Barnett, Hugo Flores Garcia, Bryan Pardo
Every artist has a creative process that draws inspiration from previous artists and their works.
1 code implementation • 12 Oct 2023 • Max Morrison, Pranav Pawar, Nathan Pruyne, Jennifer Cole, Bryan Pardo
Speech prominence estimation is the process of assigning a numeric value to the prominence of each word in an utterance.
1 code implementation • 10 Jul 2023 • Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation.
1 code implementation • 28 Jan 2023 • Max Morrison, Caedon Hsieh, Nathan Pruyne, Bryan Pardo
We also introduce a novel entropy-based method for extracting periodicity and per-frame voiced-unvoiced classifications from statistical inference-based pitch estimators (e. g., neural networks), and show how to train a neural pitch estimator to simultaneously handle both speech and music data (i. e., cross-domain estimation) without performance degradation.
no code implementations • 26 Aug 2022 • Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo
Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics.
1 code implementation • 8 Mar 2022 • Max Morrison, Brian Tang, Gefei Tan, Bryan Pardo
ReSEval lets researchers launch A/B, ABX, Mean Opinion Score (MOS) and MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) tests on audio, image, text, or video data from a command-line interface or using one line of Python, making it as easy to run as objective evaluation.
no code implementations • 25 Oct 2021 • Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Dmitry Vedenko, Bryan Pardo
We present a software framework that integrates neural networks into the popular open-source audio editing software, Audacity, with a minimal amount of developer effort.
1 code implementation • 25 Oct 2021 • Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo
We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining.
1 code implementation • 5 Oct 2021 • Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis.
1 code implementation • 14 Jul 2021 • Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Bryan Pardo
Deep learning work on musical instrument recognition has generally focused on instrument classes for which we have abundant data.
no code implementations • 16 Feb 2021 • Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript.
no code implementations • 23 Oct 2020 • Andreas Bugler, Bryan Pardo, Prem Seetharaman
Supervised deep learning methods for performing audio source separation can be very effective in domains where there is a large amount of training data.
no code implementations • 29 Sep 2020 • Ethan Manilow, Bryan Pardo
In this paper, we introduce a simple method that can separate arbitrary musical instruments from an audio mixture.
1 code implementation • 25 Jul 2020 • Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux
Clipping the gradient is a known approach to improving gradient descent, but requires hand selection of a clipping threshold hyperparameter.
1 code implementation • 23 Jun 2020 • Alexander Fang, Alisa Liu, Prem Seetharaman, Bryan Pardo
Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems.
1 code implementation • 23 Jun 2020 • Alisa Liu, Alexander Fang, Gaëtan Hadjeres, Prem Seetharaman, Bryan Pardo
In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain.
no code implementations • 5 Nov 2019 • Max Morrison, Bryan Pardo
Many automobile components in need of repair produce characteristic sounds.
no code implementations • 23 Oct 2019 • Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo
They are trained on synthetic mixtures of audio made from isolated sound source recordings so that ground truth for the separation is known.
no code implementations • 23 Oct 2019 • Alisa Liu, Prem Seetharaman, Bryan Pardo
We compare our confidence-based ensemble approach to using individual models with no selection, to an oracle that always selects the best model and to a random model selector.
no code implementations • 22 Oct 2019 • Ethan Manilow, Prem Seetharaman, Bryan Pardo
We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks.
no code implementations • 6 Nov 2018 • Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo
These estimates, together with a weighting scheme in the time-frequency domain, based on confidence in the separation quality, are used to train a deep learning model that can be used for single-channel separation, where no source direction information is available.
1 code implementation • International Society for Music Information Retrieval Conference 2018 • Julia Wilkins, Prem Seetharaman, Alison Wahl, Bryan Pardo
We present VocalSet, a singing voice dataset of a capella singing.
no code implementations • 23 Apr 2018 • Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, Derry FitzGerald, Bryan Pardo
For model-based methods, we organize them according to whether they concentrate on the lead signal, the accompaniment, or both.
Sound Audio and Speech Processing