1 code implementation • 14 Feb 2024 • Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora
APNet allows prototypes' reconstruction to waveforms for interpretability relying on the nearest training data samples.
no code implementations • 14 Dec 2023 • Benno Weck, Holger Kirchhoff, Peter Grosche, Xavier Serra
The model is evaluated on two tasks: tag-based music retrieval and music auto-tagging.
no code implementations • 14 Nov 2023 • Jyoti Narang, Viviana De La Vega, Xavier Lizarraga, Oscar Mayor, Hector Parra, Jordi Janer, Xavier Serra
Choral singing, a widely practiced form of ensemble singing, lacks comprehensive datasets in the realm of Music Information Retrieval (MIR) research, due to challenges arising from the requirement to curate multitrack recordings.
no code implementations • 23 Feb 2023 • Benno Weck, Xavier Serra
In our experiments, we find that the new splits serve as a more challenging benchmark.
1 code implementation • 15 Nov 2022 • Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes
Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos.
no code implementations • 6 Oct 2022 • Benno Weck, Miguel Pérez Fernández, Holger Kirchhoff, Xavier Serra
We present an analysis of large-scale pretrained deep learning models used for cross-modal (text-to-audio) retrieval.
1 code implementation • 22 Jul 2022 • Jose J. Valero-Mas, Antonio Javier Gallego, Pablo Alonso-Jiménez, Xavier Serra
Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$-Nearest Neighbour ($k$NN) classifier when tackling high-size corpora.
1 code implementation • 26 Nov 2021 • Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, Xavier Serra
Content creators often use music to enhance their stories, as it can be a powerful tool to convey emotion.
no code implementations • 14 Oct 2021 • Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra
Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources.
no code implementations • 26 Sep 2021 • Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello
Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version.
1 code implementation • 1 Jul 2021 • Eduardo Fonseca, Andres Ferraro, Xavier Serra
Recent studies have put into question the commonly assumed shift invariance property of convolutional networks, showing that small shifts in the input can affect the output predictions substantially.
1 code implementation • 21 May 2021 • Pritish Chandna, António Ramires, Xavier Serra, Emilia Gómez
Loops, seamlessly repeatable musical segments, are a cornerstone of modern music production.
1 code implementation • 5 May 2021 • Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra
Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings.
1 code implementation • 30 Jan 2021 • Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov
We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649, 091tracks and 148, 826 associated playlists annotated by 30, 652 different tags.
1 code implementation • 15 Nov 2020 • Eduardo Fonseca, Diego Ortego, Kevin McGuinness, Noel E. O'Connor, Xavier Serra
Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research.
1 code implementation • 30 Oct 2020 • Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra
In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings.
1 code implementation • 27 Oct 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra
In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.
8 code implementations • 1 Oct 2020 • Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.
1 code implementation • 26 Aug 2020 • Antonio Ramires, Frederic Font, Dmitry Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han, Xavier Serra
We present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by experts.
Audio and Speech Processing Sound
1 code implementation • 17 Aug 2020 • Andres Ferraro, Dietmar Jannach, Xavier Serra
Specifically, we analyze to what extent algorithms of different types may lead to concentration effects over time.
2 code implementations • 15 Jun 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra
Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.
7 code implementations • 1 Jun 2020 • Minz Won, Andres Ferraro, Dmitry Bogdanov, Xavier Serra
Recent advances in deep learning accelerated the development of content-based automatic music tagging systems.
Ranked #1 on Music Auto-Tagging on MagnaTagATune (clean)
Music Auto-Tagging Audio and Speech Processing Sound
no code implementations • 2 May 2020 • Eduardo Fonseca, Shawn Hershey, Manoj Plakal, Daniel P. W. Ellis, Aren Jansen, R. Channing Moore, Xavier Serra
The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets.
no code implementations • 8 Apr 2020 • Xavier Favory, Frederic Font, Xavier Serra
In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases.
no code implementations • 16 Mar 2020 • Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra
Essentia is a reference open-source C++/Python library for audio and music analysis.
1 code implementation • 25 Nov 2019 • António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, Xavier Serra
We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds.
no code implementations • 12 Nov 2019 • Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jay Ho Jeon, Jason Yoon
Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning.
no code implementations • 12 Nov 2019 • Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jason Yoon
Algorithms have an increasing influence on the music that we consume and understanding their behavior is fundamental to make sure they give a fair exposure to all artists across different styles.
no code implementations • 11 Nov 2019 • Minz Won, Sanghyuk Chun, Xavier Serra
Recently, we proposed a self-attention based music tagging model.
Sound Audio and Speech Processing
1 code implementation • 26 Oct 2019 • Eduardo Fonseca, Frederic Font, Xavier Serra
We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2. 5\% of accuracy boost when incorporated to two different CNNs, while requiring minimal intervention and computational overhead.
4 code implementations • 14 Sep 2019 • Jordi Pons, Xavier Serra
Pronounced as "musician", the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github. com/jordipons/musicnn.
1 code implementation • 27 Aug 2019 • Andres Perez-Lopez, Eduardo Fonseca, Xavier Serra
This work describes and discusses an algorithm submitted to the Sound Event Localization and Detection Task of DCASE2019 Challenge.
1 code implementation • 19 Jul 2019 • António Ramires, Xavier Serra
Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle.
Sound Audio and Speech Processing
2 code implementations • 12 Jun 2019 • Minz Won, Sanghyuk Chun, Xavier Serra
In addition, we demonstrate the interpretability of the proposed architecture with a heat map visualization.
Sound Audio and Speech Processing
2 code implementations • 7 Jun 2019 • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra
The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes.
1 code implementation • 28 Mar 2019 • Andrés Ferraro, Dmitry Bogdanov, Xavier Serra
The Spotify Sequential Skip Prediction Challenge focuses on predicting if a track in a session will be skipped by the user or not.
2 code implementations • 4 Jan 2019 • Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra
To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
no code implementations • 21 Nov 2018 • Xavier Favory, Eduardo Fonseca, Frederic Font, Xavier Serra
It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections.
2 code implementations • 29 Oct 2018 • Francesc Lluís, Jordi Pons, Xavier Serra
Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase.
Ranked #26 on Music Source Separation on MUSDB18
2 code implementations • 24 Oct 2018 • Jordi Pons, Joan Serrà, Xavier Serra
We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.
3 code implementations • 26 Jul 2018 • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
1 code implementation • 6 Jul 2018 • Sergio Oramas, Luis Espinosa-Anke, Francisco Gómez, Xavier Serra
Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago.
2 code implementations • 19 Jun 2018 • Eduardo Fonseca, Rong Gong, Xavier Serra
In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine.
2 code implementations • 18 Jun 2018 • Rong Gong, Xavier Serra
We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures.
no code implementations • 7 Jun 2018 • Emilia Gómez, Carlos Castillo, Vicky Charisi, Verónica Dahl, Gustavo Deco, Blagoj Delipetrev, Nicole Dewandre, Miguel Ángel González-Ballester, Fabien Gouyon, José Hernández-Orallo, Perfecto Herrera, Anders Jonsson, Ansgar Koene, Martha Larson, Ramón López de Mántaras, Bertin Martens, Marius Miron, Rubén Moreno-Bote, Nuria Oliver, Antonio Puertas Gallardo, Heike Schweitzer, Nuria Sebastian, Xavier Serra, Joan Serrà, Songül Tolan, Karina Vold
The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs.
3 code implementations • 5 Jun 2018 • Rong Gong, Xavier Serra
In the second step, the syllable and phoneme boundaries and labels are inferred hierarchically by using a duration-informed hidden Markov model (HMM).
Sound Information Retrieval Audio and Speech Processing
1 code implementation • 5 May 2018 • Jaehun Kim, Minz Won, Xavier Serra, Cynthia C. S. Liem
The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy.
2 code implementations • 1 May 2018 • Jordi Pons, Xavier Serra
The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors.
Sound Audio and Speech Processing
4 code implementations • 7 Nov 2017 • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.
Sound Audio and Speech Processing
3 code implementations • 19 Jul 2017 • Georgi Dzhambazov, Andre Holzapfel, Ajay Srinivasamurthy, Xavier Serra
The goal of this study is the automatic detection of onsets of the singing voice in polyphonic audio recordings.
1 code implementation • 12 Jul 2017 • Rong Gong, Jordi Pons, Xavier Serra
We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.
Sound
1 code implementation • 29 Jun 2017 • Sergio Oramas, Oriol Nieto, Mohamed Sordo, Xavier Serra
Second, track embeddings are learned from the audio signal and available feedback data.
7 code implementations • ICASSP 2018 2017 • Dario Rethage, Jordi Pons, Xavier Serra
In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet.
Sound
3 code implementations • 20 Mar 2017 • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra
The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.
Sound
no code implementations • LREC 2016 • Sergio Oramas, Luis Espinosa Anke, Mohamed Sordo, Horacio Saggion, Xavier Serra
In this paper we present a gold standard dataset for Entity Linking (EL) in the Music Domain.