no code implementations • 14 Oct 2021 • Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra
Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources.
1 code implementation • 1 Apr 2021 • Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov
Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information.
1 code implementation • 27 Oct 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra
In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.
8 code implementations • 1 Oct 2020 • Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.
2 code implementations • 15 Jun 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra
Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.
no code implementations • 8 Apr 2020 • Xavier Favory, Frederic Font, Xavier Serra
In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases.
1 code implementation • 25 Nov 2019 • António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, Xavier Serra
We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds.
2 code implementations • 4 Jan 2019 • Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra
To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
no code implementations • 21 Nov 2018 • Xavier Favory, Eduardo Fonseca, Frederic Font, Xavier Serra
It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections.
3 code implementations • 26 Jul 2018 • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.