Search Results for author: Xavier Serra

Found 55 papers, 40 papers with code

ChoralSynth: Synthetic Dataset of Choral Singing

no code implementations14 Nov 2023 Jyoti Narang, Viviana De La Vega, Xavier Lizarraga, Oscar Mayor, Hector Parra, Jordi Janer, Xavier Serra

Choral singing, a widely practiced form of ensemble singing, lacks comprehensive datasets in the realm of Music Information Retrieval (MIR) research, due to challenges arising from the requirement to curate multitrack recordings.

Information Retrieval Music Information Retrieval +1

Data leakage in cross-modal retrieval training: A case study

no code implementations23 Feb 2023 Benno Weck, Xavier Serra

In our experiments, we find that the new splits serve as a more challenging benchmark.

Cross-Modal Retrieval Retrieval +1

FlowGrad: Using Motion for Visual Sound Source Localization

1 code implementation15 Nov 2022 Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes

Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos.

Optical Flow Estimation Scene Understanding

Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification

1 code implementation22 Jul 2022 Jose J. Valero-Mas, Antonio Javier Gallego, Pablo Alonso-Jiménez, Xavier Serra

Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$-Nearest Neighbour ($k$NN) classifier when tackling high-size corpora.

Emotion Embedding Spaces for Matching Music to Stories

1 code implementation26 Nov 2021 Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, Xavier Serra

Content creators often use music to enhance their stories, as it can be a powerful tool to convey emotion.

Cross-Modal Retrieval Metric Learning +1

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

no code implementations14 Oct 2021 Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra

Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources.

Audio captioning Word Embeddings

Soundata: A Python library for reproducible use of audio datasets

no code implementations26 Sep 2021 Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version.

Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks

1 code implementation1 Jul 2021 Eduardo Fonseca, Andres Ferraro, Xavier Serra

Recent studies have put into question the commonly assumed shift invariance property of convolutional networks, showing that small shifts in the input can affect the output predictions substantially.

Self-Supervised Learning from Automatically Separated Sound Scenes

1 code implementation5 May 2021 Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings.

Contrastive Learning Self-Supervised Learning

Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

1 code implementation30 Jan 2021 Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov

We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649, 091tracks and 148, 826 associated playlists annotated by 30, 652 different tags.

Audio Signal Processing Collaborative Filtering +6

Unsupervised Contrastive Learning of Sound Event Representations

1 code implementation15 Nov 2020 Eduardo Fonseca, Diego Ortego, Kevin McGuinness, Noel E. O'Connor, Xavier Serra

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research.

Contrastive Learning Representation Learning

Multimodal Metric Learning for Tag-based Music Retrieval

1 code implementation30 Oct 2020 Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra

In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings.

Cross-Modal Retrieval Metric Learning +4

Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags

1 code implementation27 Oct 2020 Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

In this work we propose a method for learning audio representations using an audio autoencoder (AAE), a general word embeddings model (WEM), and a multi-head self-attention (MHA) mechanism.

Representation Learning TAG +1

FSD50K: An Open Dataset of Human-Labeled Sound Events

8 code implementations1 Oct 2020 Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.

The Freesound Loop Dataset and Annotation Tool

1 code implementation26 Aug 2020 Antonio Ramires, Frederic Font, Dmitry Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han, Xavier Serra

We present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by experts.

Audio and Speech Processing Sound

Exploring Longitudinal Effects of Session-based Recommendations

1 code implementation17 Aug 2020 Andres Ferraro, Dietmar Jannach, Xavier Serra

Specifically, we analyze to what extent algorithms of different types may lead to concentration effects over time.

Re-Ranking Session-Based Recommendations

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

2 code implementations15 Jun 2020 Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features.

Representation Learning

Evaluation of CNN-based Automatic Music Tagging Models

7 code implementations1 Jun 2020 Minz Won, Andres Ferraro, Dmitry Bogdanov, Xavier Serra

Recent advances in deep learning accelerated the development of content-based automatic music tagging systems.

Music Auto-Tagging Audio and Speech Processing Sound

Search Result Clustering in Collaborative Sound Collections

no code implementations8 Apr 2020 Xavier Favory, Frederic Font, Xavier Serra

In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases.

Clustering

TensorFlow Audio Models in Essentia

no code implementations16 Mar 2020 Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra

Essentia is a reference open-source C++/Python library for audio and music analysis.

Music Tagging TAG

Neural Percussive Synthesis Parameterised by High-Level Timbral Features

1 code implementation25 Nov 2019 António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, Xavier Serra

We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds.

Vocal Bursts Intensity Prediction

How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging

no code implementations12 Nov 2019 Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jay Ho Jeon, Jason Yoon

Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning.

Information Retrieval Music Auto-Tagging +2

Artist and style exposure bias in collaborative filtering based music recommendations

no code implementations12 Nov 2019 Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jason Yoon

Algorithms have an increasing influence on the music that we consume and understanding their behavior is fundamental to make sure they give a fair exposure to all artists across different styles.

Collaborative Filtering Music Recommendation

Visualizing and Understanding Self-attention based Music Tagging

no code implementations11 Nov 2019 Minz Won, Sanghyuk Chun, Xavier Serra

Recently, we proposed a self-attention based music tagging model.

Sound Audio and Speech Processing

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

1 code implementation26 Oct 2019 Eduardo Fonseca, Frederic Font, Xavier Serra

We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2. 5\% of accuracy boost when incorporated to two different CNNs, while requiring minimal intervention and computational overhead.

General Classification

musicnn: Pre-trained convolutional neural networks for music audio tagging

4 code implementations14 Sep 2019 Jordi Pons, Xavier Serra

Pronounced as "musician", the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github. com/jordipons/musicnn.

Audio Tagging Transfer Learning

A hybrid parametric-deep learning approach for sound event localization and detection

1 code implementation27 Aug 2019 Andres Perez-Lopez, Eduardo Fonseca, Xavier Serra

This work describes and discusses an algorithm submitted to the Sound Event Localization and Detection Task of DCASE2019 Challenge.

Sound Event Localization and Detection

Data Augmentation for Instrument Classification Robust to Audio Effects

1 code implementation19 Jul 2019 António Ramires, Xavier Serra

Reusing recorded sounds (sampling) is a key component in Electronic Music Production (EMP), which has been present since its early days and is at the core of genres like hip-hop or jungle.

Sound Audio and Speech Processing

Toward Interpretable Music Tagging with Self-Attention

2 code implementations12 Jun 2019 Minz Won, Sanghyuk Chun, Xavier Serra

In addition, we demonstrate the interpretability of the proposed architecture with a heat map visualization.

Sound Audio and Speech Processing

Audio tagging with noisy labels and minimal supervision

2 code implementations7 Jun 2019 Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra

The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes.

Audio Tagging Task 2

Skip prediction using boosting trees based on acoustic features of tracks in sessions

1 code implementation28 Mar 2019 Andrés Ferraro, Dmitry Bogdanov, Xavier Serra

The Spotify Sequential Skip Prediction Challenge focuses on predicting if a track in a session will be skipped by the user or not.

Sequential skip prediction

Learning Sound Event Classifiers from Web Audio with Noisy Labels

2 code implementations4 Jan 2019 Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra

To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

General Classification Sound Event Detection

Facilitating the Manual Annotation of Sounds When Using Large Taxonomies

no code implementations21 Nov 2018 Xavier Favory, Eduardo Fonseca, Frederic Font, Xavier Serra

It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections.

Information Retrieval Retrieval

End-to-end music source separation: is it possible in the waveform domain?

2 code implementations29 Oct 2018 Francesc Lluís, Jordi Pons, Xavier Serra

Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase.

Music Source Separation

Training neural audio classifiers with few data

2 code implementations24 Oct 2018 Jordi Pons, Joan Serrà, Xavier Serra

We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.

Acoustic Scene Classification General Classification +2

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

3 code implementations26 Jul 2018 Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

Audio Tagging Task 2

Natural Language Processing for Music Knowledge Discovery

1 code implementation6 Jul 2018 Sergio Oramas, Luis Espinosa-Anke, Francisco Gómez, Xavier Serra

Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago.

Graph Generation Sentiment Analysis

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

2 code implementations19 Jun 2018 Eduardo Fonseca, Rong Gong, Xavier Serra

In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine.

Acoustic Scene Classification Classification +3

Towards an efficient deep learning model for musical onset detection

2 code implementations18 Jun 2018 Rong Gong, Xavier Serra

We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures.

Transfer Learning

Assessing the impact of machine intelligence on human behaviour: an interdisciplinary endeavour

no code implementations7 Jun 2018 Emilia Gómez, Carlos Castillo, Vicky Charisi, Verónica Dahl, Gustavo Deco, Blagoj Delipetrev, Nicole Dewandre, Miguel Ángel González-Ballester, Fabien Gouyon, José Hernández-Orallo, Perfecto Herrera, Anders Jonsson, Ansgar Koene, Martha Larson, Ramón López de Mántaras, Bertin Martens, Marius Miron, Rubén Moreno-Bote, Nuria Oliver, Antonio Puertas Gallardo, Heike Schweitzer, Nuria Sebastian, Xavier Serra, Joan Serrà, Songül Tolan, Karina Vold

The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs.

Decision Making

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

3 code implementations5 Jun 2018 Rong Gong, Xavier Serra

In the second step, the syllable and phoneme boundaries and labels are inferred hierarchically by using a duration-informed hidden Markov model (HMM).

Sound Information Retrieval Audio and Speech Processing

Transfer Learning of Artist Group Factors to Musical Genre Classification

1 code implementation5 May 2018 Jaehun Kim, Minz Won, Xavier Serra, Cynthia C. S. Liem

The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy.

Classification General Classification +2

Randomly weighted CNNs for (music) audio classification

2 code implementations1 May 2018 Jordi Pons, Xavier Serra

The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors.

Sound Audio and Speech Processing

End-to-end learning for music audio tagging at scale

4 code implementations7 Nov 2017 Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra

The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.

Sound Audio and Speech Processing

Metrical-accent Aware Vocal Onset Detection in Polyphonic Audio

3 code implementations19 Jul 2017 Georgi Dzhambazov, Andre Holzapfel, Ajay Srinivasamurthy, Xavier Serra

The goal of this study is the automatic detection of onsets of the singing voice in polyphonic audio recordings.

Position

Audio to score matching by combining phonetic and duration information

1 code implementation12 Jul 2017 Rong Gong, Jordi Pons, Xavier Serra

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.

Sound

A Deep Multimodal Approach for Cold-start Music Recommendation

1 code implementation29 Jun 2017 Sergio Oramas, Oriol Nieto, Mohamed Sordo, Xavier Serra

Second, track embeddings are learned from the audio signal and available feedback data.

Music Recommendation

A Wavenet for Speech Denoising

7 code implementations ICASSP 2018 2017 Dario Rethage, Jordi Pons, Xavier Serra

In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet.

Sound

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

3 code implementations20 Mar 2017 Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.

Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.