no code implementations • 17 Apr 2024 • Keren Shao, Ke Chen, Shlomo Dubnov
In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline.
no code implementations • 9 Feb 2024 • Vignesh Gokul, Chris Francis, Shlomo Dubnov
We propose a method to compute the information flow using pre-trained generative models as entropy estimators.
no code implementations • 4 Jan 2024 • Vignesh Gokul, Shlomo Dubnov
Recent works such as CUDA propose solutions to this problem by adding class-wise blurs to make datasets unlearnable, i. e a model can never use the acquired dataset for learning.
no code implementations • 21 Nov 2023 • Weihan Xu, Julian McAuley, Shlomo Dubnov, Hao-Wen Dong
We then propose a simple technique to equip this pretrained unconditional music transformer model with instrument and genre controls by finetuning the model with additional control tokens.
no code implementations • 14 Oct 2023 • Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models.
1 code implementation • 4 Aug 2023 • Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov
In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance.
1 code implementation • 3 Aug 2023 • Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.
2 code implementations • 14 Jul 2022 • Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, Taylor Berg-Kirkpatrick
Existing approaches for generating multitrack music with transformer models have been limited in terms of the number of instruments, the length of the music segments and slow inference.
1 code implementation • 2 Feb 2022 • Ke Chen, Shuai Yu, Cheng-i Wang, Wei Li, Taylor Berg-Kirkpatrick, Shlomo Dubnov
In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture.
1 code implementation • 2 Feb 2022 • Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov
To combat these problems, we introduce HTS-AT: an audio transformer with a hierarchical structure to reduce the model size and training time.
Ranked #4 on Sound Event Detection on DESED
1 code implementation • 15 Dec 2021 • Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
Ranked #1 on Audio Source Separation on AudioSet
no code implementations • AAAI 2021 • Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
no code implementations • 24 Nov 2021 • Shlomo Dubnov, Kevin Huang, Cheng-i Wang
The framework is based on an Music Information Dynamics model, a Variable Markov Oracle (VMO), and is extended with a variational representation learning of audio.
no code implementations • 13 Apr 2021 • Eunjeong Koh, Shlomo Dubnov
We implement several multi-class classifiers with deep audio embeddings to predict emotion semantics in music.
no code implementations • 17 Mar 2021 • Vaikkunth Mugunthan, Vignesh Gokul, Lalana Kagal, Shlomo Dubnov
Our approach generates metadata at the aggregator using the models received from clients and retrains the federated model to achieve bias-free results for image synthesis.
1 code implementation • 4 Mar 2021 • Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar
There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 15 Feb 2021 • Paarth Neekhara, Shehzeen Hussain, Jinglong Du, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
Recent works on adversarial reprogramming have shown that it is possible to repurpose neural networks for alternate tasks without modifying the network architecture or parameters.
no code implementations • 1 Feb 2021 • Shlomo Dubnov
In this paper we introduce a novel framework that we call Deep Musical Information Dynamics, which combines two parallel streams - a low rate latent representation stream that is assumed to capture the dynamics of a thought process contrasted with a higher rate information dynamics derived from the musical data itself.
no code implementations • 30 Jan 2021 • Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
In this work, we propose a controllable voice cloning method that allows fine-grained control over various style aspects of the synthesized speech for an unseen speaker.
no code implementations • 22 Oct 2020 • Vaikkunth Mugunthan, Vignesh Gokul, Lalana Kagal, Shlomo Dubnov
The Information Maximizing GAN (InfoGAN) is a variant of the default GAN that introduces feature-control variables that are automatically learned by the framework, hence providing greater control over the different kinds of images produced.
1 code implementation • 4 Aug 2020 • Ke Chen, Cheng-i Wang, Taylor Berg-Kirkpatrick, Shlomo Dubnov
Drawing an analogy with automatic image completion systems, we propose Music SketchNet, a neural network framework that allows users to specify partial musical ideas guiding automatic music generation.
1 code implementation • 5 Feb 2020 • Ke Chen, Gus Xia, Shlomo Dubnov
Automatic music generation is an interdisciplinary research topic that combines computational creativity and semantic analysis of music to create automatic machine improvisations.
1 code implementation • 21 Jun 2019 • Shlomo Dubnov
In this paper we explore techniques for generating new music using a Variational Autoencoder (VAE) neural network that was trained on a corpus of specific style.
no code implementations • 9 May 2019 • Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar
In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 16 Apr 2019 • Paarth Neekhara, Chris Donahue, Miller Puckette, Shlomo Dubnov, Julian McAuley
Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms.
1 code implementation • 20 Nov 2018 • Ke Chen, Weilin Zhang, Shlomo Dubnov, Gus Xia, Wei Li
With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity.
no code implementations • 7 Oct 2018 • Eunjeong Stella Koh, Shlomo Dubnov, Dustin Wright
Our results suggest that the proposed model has a better statistical resemblance to the musical structure of the training data, which improves the creation of new sequences of music in the style of the originals.
1 code implementation • IJCNLP 2019 • Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar
Adversarial Reprogramming has demonstrated success in utilizing pre-trained neural network classifiers for alternative classification tasks without modification to the original network.