Search Results for author: Jordi Pons

Found 29 papers, 16 papers with code

Long-form music generation with latent diffusion

no code implementations16 Apr 2024 Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure.

Music Generation

Fast Timing-Conditioned Latent Audio Diffusion

2 code implementations7 Feb 2024 Zach Evans, CJ Carr, Josiah Taylor, Scott H. Hawley, Jordi Pons

Generating long-form 44. 1kHz stereo audio from text prompts can be computationally demanding.

Audio Generation

GASS: Generalizing Audio Source Separation with Large-scale Data

no code implementations29 Sep 2023 Jordi Pons, Xiaoyu Liu, Santiago Pascual, Joan Serrà

Here, we study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.

Audio Source Separation Speech Separation

Mono-to-stereo through parametric stereo generation

no code implementations26 Jun 2023 Joan Serrà, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi Pons, Jeroen Breebaart, Giulio Cengarle

Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements.

Towards Robust Image-in-Audio Deep Steganography

1 code implementation9 Mar 2023 Jaume Ros, Margarita Geleta, Jordi Pons, Xavier Giro-i-Nieto

The field of steganography has experienced a surge of interest due to the recent advancements in AI-powered techniques, particularly in the context of multimodal setups that enable the concealment of signals within signals of a different nature.

Image Reconstruction

Full-band General Audio Synthesis with Score-based Diffusion

no code implementations26 Oct 2022 Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà

Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds.

Audio Synthesis Diversity

Adversarial Permutation Invariant Training for Universal Sound Separation

no code implementations21 Oct 2022 Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà

Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so.

Universal Speech Enhancement with Score-based Diffusion

1 code implementation7 Jun 2022 Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini

We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.

Speech Enhancement

On loss functions and evaluation metrics for music source separation

no code implementations16 Feb 2022 Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà

We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation.

Audio Source Separation Benchmarking +1

Upsampling layers for music source separation

no code implementations23 Nov 2021 Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini

Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling.

Music Source Separation

On tuning consistent annealed sampling for denoising score matching

no code implementations8 Apr 2021 Joan Serrà, Santiago Pascual, Jordi Pons

Score-based generative models provide state-of-the-art quality for image and audio synthesis.

Audio Synthesis Denoising

Multichannel-based learning for audio object extraction

no code implementations11 Feb 2021 Daniel Arteaga, Jordi Pons

The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata.

Sound Audio and Speech Processing

On permutation invariant training for speech source separation

no code implementations9 Feb 2021 Xiaoyu Liu, Jordi Pons

We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models.

Clustering Speaker Separation

Upsampling artifacts in neural audio synthesis

1 code implementation27 Oct 2020 Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà

We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

Audio Signal Processing Audio Synthesis

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

1 code implementation20 Oct 2020 Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà

Applications of deep learning to automatic multitrack mixing are largely unexplored.

Audio and Speech Processing Sound

SESQA: semi-supervised learning for speech quality assessment

no code implementations1 Oct 2020 Joan Serrà, Jordi Pons, Santiago Pascual

Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches.

FSD50K: An Open Dataset of Human-Labeled Sound Events

8 code implementations1 Oct 2020 Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.

TensorFlow Audio Models in Essentia

no code implementations16 Mar 2020 Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra

Essentia is a reference open-source C++/Python library for audio and music analysis.

Music Tagging TAG

An empirical study of Conv-TasNet

1 code implementation20 Feb 2020 Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar

Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.


musicnn: Pre-trained convolutional neural networks for music audio tagging

4 code implementations14 Sep 2019 Jordi Pons, Xavier Serra

Pronounced as "musician", the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github. com/jordipons/musicnn.

Audio Tagging Transfer Learning

End-to-end music source separation: is it possible in the waveform domain?

2 code implementations29 Oct 2018 Francesc Lluís, Jordi Pons, Xavier Serra

Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase.

Music Source Separation

Training neural audio classifiers with few data

2 code implementations24 Oct 2018 Jordi Pons, Joan Serrà, Xavier Serra

We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.

Acoustic Scene Classification General Classification +2

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

3 code implementations26 Jul 2018 Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

Audio Tagging Task 2

Randomly weighted CNNs for (music) audio classification

2 code implementations1 May 2018 Jordi Pons, Xavier Serra

The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors.

Sound Audio and Speech Processing

End-to-end learning for music audio tagging at scale

4 code implementations7 Nov 2017 Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra

The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.

Sound Audio and Speech Processing

Audio to score matching by combining phonetic and duration information

1 code implementation12 Jul 2017 Rong Gong, Jordi Pons, Xavier Serra

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.


A Wavenet for Speech Denoising

7 code implementations ICASSP 2018 2017 Dario Rethage, Jordi Pons, Xavier Serra

In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet.


Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

3 code implementations20 Mar 2017 Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.


Cannot find the paper you are looking for? You can Submit a new open access paper.