Search Results for author: Xavier Serra

Found 55 papers, 40 papers with code

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

1 code implementation • 14 Feb 2024 • Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora

APNet allows prototypes' reconstruction to waveforms for interpretability relying on the nearest training data samples.

Audio Classification

Paper
Code

WikiMuTe: A web-sourced dataset of semantic descriptions for music audio

no code implementations • 14 Dec 2023 • Benno Weck, Holger Kirchhoff, Peter Grosche, Xavier Serra

The model is evaluated on two tasks: tag-based music retrieval and music auto-tagging.

Cross-Modal Retrieval Information Retrieval +4

Paper
Add Code

ChoralSynth: Synthetic Dataset of Choral Singing

no code implementations • 14 Nov 2023 • Jyoti Narang, Viviana De La Vega, Xavier Lizarraga, Oscar Mayor, Hector Parra, Jordi Janer, Xavier Serra

Choral singing, a widely practiced form of ensemble singing, lacks comprehensive datasets in the realm of Music Information Retrieval (MIR) research, due to challenges arising from the requirement to curate multitrack recordings.

Information Retrieval Music Information Retrieval +1

Paper
Add Code

Data leakage in cross-modal retrieval training: A case study

no code implementations • 23 Feb 2023 • Benno Weck, Xavier Serra

In our experiments, we find that the new splits serve as a more challenging benchmark.

Cross-Modal Retrieval Retrieval +1

Paper
Add Code

FlowGrad: Using Motion for Visual Sound Source Localization

1 code implementation • 15 Nov 2022 • Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes

Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos.

Optical Flow Estimation Scene Understanding

Paper
Code

Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval

no code implementations • 6 Oct 2022 • Benno Weck, Miguel Pérez Fernández, Holger Kirchhoff, Xavier Serra

We present an analysis of large-scale pretrained deep learning models used for cross-modal (text-to-audio) retrieval.

Metric Learning Retrieval +2

Paper
Add Code

Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification

1 code implementation • 22 Jul 2022 • Jose J. Valero-Mas, Antonio Javier Gallego, Pablo Alonso-Jiménez, Xavier Serra

Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$-Nearest Neighbour ($k$NN) classifier when tackling high-size corpora.

Paper
Code

Emotion Embedding Spaces for Matching Music to Stories

1 code implementation • 26 Nov 2021 • Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, Xavier Serra

Content creators often use music to enhance their stories, as it can be a powerful tool to convey emotion.

Cross-Modal Retrieval Metric Learning +1

Paper
Code

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

no code implementations • 14 Oct 2021 • Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra

Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources.

Audio captioning Word Embeddings

Paper
Add Code

Soundata: A Python library for reproducible use of audio datasets

no code implementations • 26 Sep 2021 • Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version.

Paper
Add Code

Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks

1 code implementation • 1 Jul 2021 • Eduardo Fonseca, Andres Ferraro, Xavier Serra

Recent studies have put into question the commonly assumed shift invariance property of convolutional networks, showing that small shifts in the input can affect the output predictions substantially.

Paper
Code

LoopNet: Musical Loop Synthesis Conditioned On Intuitive Musical Parameters

1 code implementation • 21 May 2021 • Pritish Chandna, António Ramires, Xavier Serra, Emilia Gómez

Loops, seamlessly repeatable musical segments, are a cornerstone of modern music production.

Information Retrieval Music Information Retrieval +1

Paper
Code

Self-Supervised Learning from Automatically Separated Sound Scenes

1 code implementation • 5 May 2021 • Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings.

Contrastive Learning Self-Supervised Learning

Paper
Code

Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

1 code implementation • 30 Jan 2021 • Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov

We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649, 091tracks and 148, 826 associated playlists annotated by 30, 652 different tags.

Audio Signal Processing Collaborative Filtering +6

Paper
Code

Unsupervised Contrastive Learning of Sound Event Representations

1 code implementation • 15 Nov 2020 • Eduardo Fonseca, Diego Ortego, Kevin McGuinness, Noel E. O'Connor, Xavier Serra

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research.

Contrastive Learning Representation Learning

Paper
Code

Multimodal Metric Learning for Tag-based Music Retrieval

1 code implementation • 30 Oct 2020 • Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra

In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings.

Cross-Modal Retrieval Metric Learning +4

Paper
Code

Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags

1 code implementation • 27 Oct 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.

Representation Learning TAG +1

Paper
Code

FSD50K: An Open Dataset of Human-Labeled Sound Events

8 code implementations • 1 Oct 2020 • Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.

Paper
Code

The Freesound Loop Dataset and Annotation Tool

1 code implementation • 26 Aug 2020 • Antonio Ramires, Frederic Font, Dmitry Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han, Xavier Serra

We present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by experts.

Audio and Speech Processing Sound

Paper
Code

Exploring Longitudinal Effects of Session-based Recommendations

1 code implementation • 17 Aug 2020 • Andres Ferraro, Dietmar Jannach, Xavier Serra

Specifically, we analyze to what extent algorithms of different types may lead to concentration effects over time.

Re-Ranking Session-Based Recommendations

Paper
Code

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

2 code implementations • 15 Jun 2020 • Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.

Representation Learning

Paper
Code

Evaluation of CNN-based Automatic Music Tagging Models

7 code implementations • 1 Jun 2020 • Minz Won, Andres Ferraro, Dmitry Bogdanov, Xavier Serra

Recent advances in deep learning accelerated the development of content-based automatic music tagging systems.

Ranked #1 on Music Auto-Tagging on MagnaTagATune (clean)

Music Auto-Tagging Audio and Speech Processing Sound

369

Paper
Code

Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

no code implementations • 2 May 2020 • Eduardo Fonseca, Shawn Hershey, Manoj Plakal, Daniel P. W. Ellis, Aren Jansen, R. Channing Moore, Xavier Serra

The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets.

Missing Labels

Paper
Add Code

Search Result Clustering in Collaborative Sound Collections

no code implementations • 8 Apr 2020 • Xavier Favory, Frederic Font, Xavier Serra

In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases.

Clustering

Paper
Add Code

TensorFlow Audio Models in Essentia

no code implementations • 16 Mar 2020 • Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra

Essentia is a reference open-source C++/Python library for audio and music analysis.

Music Tagging TAG

Paper
Add Code

Neural Percussive Synthesis Parameterised by High-Level Timbral Features

1 code implementation • 25 Nov 2019 • António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, Xavier Serra

We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds.

Vocal Bursts Intensity Prediction

Paper
Code

How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging

no code implementations • 12 Nov 2019 • Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jay Ho Jeon, Jason Yoon

Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning.

Information Retrieval Music Auto-Tagging +2

Paper
Add Code

Artist and style exposure bias in collaborative filtering based music recommendations

no code implementations • 12 Nov 2019 • Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jason Yoon

Algorithms have an increasing influence on the music that we consume and understanding their behavior is fundamental to make sure they give a fair exposure to all artists across different styles.

Collaborative Filtering Music Recommendation

Paper
Add Code

Visualizing and Understanding Self-attention based Music Tagging

no code implementations • 11 Nov 2019 • Minz Won, Sanghyuk Chun, Xavier Serra

Recently, we proposed a self-attention based music tagging model.

Sound Audio and Speech Processing

Paper
Add Code

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

1 code implementation • 26 Oct 2019 • Eduardo Fonseca, Frederic Font, Xavier Serra

We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2. 5\% of accuracy boost when incorporated to two different CNNs, while requiring minimal intervention and computational overhead.

General Classification

Paper
Code

musicnn: Pre-trained convolutional neural networks for music audio tagging

4 code implementations • 14 Sep 2019 • Jordi Pons, Xavier Serra

Pronounced as "musician", the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github. com/jordipons/musicnn.

Audio Tagging Transfer Learning

559

Paper
Code

A hybrid parametric-deep learning approach for sound event localization and detection

1 code implementation • 27 Aug 2019 • Andres Perez-Lopez, Eduardo Fonseca, Xavier Serra

This work describes and discusses an algorithm submitted to the Sound Event Localization and Detection Task of DCASE2019 Challenge.

Sound Event Localization and Detection

Paper
Code

Data Augmentation for Instrument Classification Robust to Audio Effects

1 code implementation • 19 Jul 2019 • António Ramires, Xavier Serra

Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle.

Sound Audio and Speech Processing

Paper
Code

Toward Interpretable Music Tagging with Self-Attention

2 code implementations • 12 Jun 2019 • Minz Won, Sanghyuk Chun, Xavier Serra

In addition, we demonstrate the interpretability of the proposed architecture with a heat map visualization.

Sound Audio and Speech Processing

Paper
Code

Audio tagging with noisy labels and minimal supervision

2 code implementations • 7 Jun 2019 • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra

The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes.

Audio Tagging Task 2

310

Paper
Code

Skip prediction using boosting trees based on acoustic features of tracks in sessions

1 code implementation • 28 Mar 2019 • Andrés Ferraro, Dmitry Bogdanov, Xavier Serra

The Spotify Sequential Skip Prediction Challenge focuses on predicting if a track in a session will be skipped by the user or not.

Sequential skip prediction

Paper
Code

Learning Sound Event Classifiers from Web Audio with Noisy Labels

2 code implementations • 4 Jan 2019 • Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra

To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

General Classification Sound Event Detection

310

Paper
Code

Facilitating the Manual Annotation of Sounds When Using Large Taxonomies

no code implementations • 21 Nov 2018 • Xavier Favory, Eduardo Fonseca, Frederic Font, Xavier Serra

It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections.

Information Retrieval Retrieval

Paper
Add Code

End-to-end music source separation: is it possible in the waveform domain?

2 code implementations • 29 Oct 2018 • Francesc Lluís, Jordi Pons, Xavier Serra

Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase.

Ranked #26 on Music Source Separation on MUSDB18

Music Source Separation

220

Paper
Code

Training neural audio classifiers with few data

2 code implementations • 24 Oct 2018 • Jordi Pons, Joan Serrà, Xavier Serra

We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.

Acoustic Scene Classification General Classification +2

Paper
Code

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

3 code implementations • 26 Jul 2018 • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

Audio Tagging Task 2

Paper
Code

Natural Language Processing for Music Knowledge Discovery

1 code implementation • 6 Jul 2018 • Sergio Oramas, Luis Espinosa-Anke, Francisco Gómez, Xavier Serra

Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago.

Graph Generation Sentiment Analysis

Paper
Code

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

2 code implementations • 19 Jun 2018 • Eduardo Fonseca, Rong Gong, Xavier Serra

In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine.

Acoustic Scene Classification Classification +3

Paper
Code

Towards an efficient deep learning model for musical onset detection

2 code implementations • 18 Jun 2018 • Rong Gong, Xavier Serra

We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures.

Transfer Learning

Paper
Code

Assessing the impact of machine intelligence on human behaviour: an interdisciplinary endeavour

no code implementations • 7 Jun 2018 • Emilia Gómez, Carlos Castillo, Vicky Charisi, Verónica Dahl, Gustavo Deco, Blagoj Delipetrev, Nicole Dewandre, Miguel Ángel González-Ballester, Fabien Gouyon, José Hernández-Orallo, Perfecto Herrera, Anders Jonsson, Ansgar Koene, Martha Larson, Ramón López de Mántaras, Bertin Martens, Marius Miron, Rubén Moreno-Bote, Nuria Oliver, Antonio Puertas Gallardo, Heike Schweitzer, Nuria Sebastian, Xavier Serra, Joan Serrà, Songül Tolan, Karina Vold

The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs.

Decision Making

Paper
Add Code

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

3 code implementations • 5 Jun 2018 • Rong Gong, Xavier Serra

In the second step, the syllable and phoneme boundaries and labels are inferred hierarchically by using a duration-informed hidden Markov model (HMM).

Sound Information Retrieval Audio and Speech Processing

Paper
Code

Transfer Learning of Artist Group Factors to Musical Genre Classification

1 code implementation • 5 May 2018 • Jaehun Kim, Minz Won, Xavier Serra, Cynthia C. S. Liem

The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy.

Classification General Classification +2

Paper
Code

Randomly weighted CNNs for (music) audio classification

2 code implementations • 1 May 2018 • Jordi Pons, Xavier Serra

The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors.

Sound Audio and Speech Processing

143

Paper
Code

End-to-end learning for music audio tagging at scale

4 code implementations • 7 Nov 2017 • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra

The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.

Sound Audio and Speech Processing

296

Paper
Code

Metrical-accent Aware Vocal Onset Detection in Polyphonic Audio

3 code implementations • 19 Jul 2017 • Georgi Dzhambazov, Andre Holzapfel, Ajay Srinivasamurthy, Xavier Serra

The goal of this study is the automatic detection of onsets of the singing voice in polyphonic audio recordings.

Position

Paper
Code

Audio to score matching by combining phonetic and duration information

1 code implementation • 12 Jul 2017 • Rong Gong, Jordi Pons, Xavier Serra

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.

Sound

Paper
Code

A Deep Multimodal Approach for Cold-start Music Recommendation

1 code implementation • 29 Jun 2017 • Sergio Oramas, Oriol Nieto, Mohamed Sordo, Xavier Serra

Second, track embeddings are learned from the audio signal and available feedback data.

Music Recommendation

101

Paper
Code

A Wavenet for Speech Denoising

7 code implementations • ICASSP 2018 2017 • Dario Rethage, Jordi Pons, Xavier Serra

In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet.

Sound

655

Paper
Code

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

3 code implementations • 20 Mar 2017 • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.

Sound

Paper
Code

ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain

no code implementations • LREC 2016 • Sergio Oramas, Luis Espinosa Anke, Mohamed Sordo, Horacio Saggion, Xavier Serra

In this paper we present a gold standard dataset for Entity Linking (EL) in the Music Domain.

Entity Linking

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.