Vocal Percussion Transcription (VPT) is concerned with the automatic detection and classification of vocal percussion sound events, allowing music creators and producers to sketch drum lines on the fly.
Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities.
In this work we propose a HyperTransformer, a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
In this work we propose a HyperTransformer, a transformer based model that generates all weights of a CNN model directly from the support samples.
From practical perspective, our approach allows to: (a) reuse existing modules for learning new task by adjusting the computation order, (b) use it for unsupervised multi-source domain adaptation to illustrate that adaptation to unseen data can be achieved by only manipulating the order of pretrained modules, (c) show how our approach can be used to increase accuracy of existing architectures for image classification tasks such as ImageNet, without any parameter increase, by reusing the same block multiple times.
We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule.
In this paper we address the question: can task-specific detectors be trained and represented as a shared set of weights, plus a very small set of additional weights for each task?
Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning.
The update rule is applied repeatedly in parallel to a large random subset of cells and after convergence is used to produce segmentation masks that are then back-propagated to learn the optimal update rules using standard gradient descent methods.
Despite the success of deep neural networks (DNNs), state-of-the-art models are too large to deploy on low-resource devices or common server configurations in which multiple models are held in memory.
We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution").
In addition, we show that the adversarial attacks are very effective across the different models.
The empirical evidence suggests the proposed method for computation of visibility graphs offers an on-line computation solution at no additional computation time cost.
Data Structures and Algorithms
We achieve new state of the art results for mobile classification, detection and segmentation.
Ranked #25 on Semantic Segmentation on DADA-seg
We introduce a novel method that enables parameter-efficient transfer and multitask learning.
We present experiments demonstrating the utility of this distance measure for real and synthesised audio data.
Sound Audio and Speech Processing
We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks.
In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.
Ranked #8 on Real-Time Object Detection on COCO (using extra training data)
This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget.
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.
Ranked #5 on Retinal OCT Disease Classification on OCT2017
Following their success in Computer Vision and other areas, deep learning techniques have recently become widely adopted in Music Information Retrieval (MIR) research.
In this paper, we empirically investigate the effect of audio preprocessing on music tagging with deep neural networks.
The results highlight several important aspects of music tagging and neural networks.
In this paper, we present a transfer learning approach for music classification and regression tasks.
Deep convolutional neural networks (CNNs) have been actively adopted in the field of music information retrieval, e. g. genre classification, mood detection, and chord recognition.
We introduce a novel playlist generation algorithm that focuses on the quality of transitions using a recurrent neural network (RNN).