Search Results for author: Joan Serrà

Found 34 papers, 13 papers with code

Full-band General Audio Synthesis with Score-based Diffusion

no code implementations26 Oct 2022 Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà

Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds.

Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation

no code implementations23 Oct 2022 Xiaoyu Liu, Xu Li, Joan Serrà

Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a mixture of multiple talkers given an enrollment utterance of that speaker.

Speaker Identification Speaker Separation

Adversarial Permutation Invariant Training for Universal Sound Separation

no code implementations21 Oct 2022 Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà

Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so.

Universal Speech Enhancement with Score-based Diffusion

no code implementations7 Jun 2022 Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini

We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.

Speech Enhancement

On loss functions and evaluation metrics for music source separation

no code implementations16 Feb 2022 Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà

We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation.

Audio Source Separation Benchmarking +1

Upsampling layers for music source separation

no code implementations23 Nov 2021 Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini

Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling.

Music Source Separation

Assessing Algorithmic Biases for Musical Version Identification

no code implementations30 Sep 2021 Furkan Yesiler, Marius Miron, Joan Serrà, Emilia Gómez

Version identification (VI) systems now offer accurate and scalable solutions for detecting different renditions of a musical composition, allowing the use of these systems in industrial applications and throughout the wider music ecosystem.

Information Retrieval Management +1

On tuning consistent annealed sampling for denoising score matching

no code implementations8 Apr 2021 Joan Serrà, Santiago Pascual, Jordi Pons

Score-based generative models provide state-of-the-art quality for image and audio synthesis.


Investigating the efficacy of music version retrieval systems for setlist identification

no code implementations6 Jan 2021 Furkan Yesiler, Emilio Molina, Joan Serrà, Emilia Gómez

The setlist identification (SLI) task addresses a music recognition use case where the goal is to retrieve the metadata and timestamps for all the tracks played in live music events.


Upsampling artifacts in neural audio synthesis

1 code implementation27 Oct 2020 Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà

We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

Audio Signal Processing

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

1 code implementation20 Oct 2020 Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà

Applications of deep learning to automatic multitrack mixing are largely unexplored.

Audio and Speech Processing Sound

Less is more: Faster and better music version identification with embedding distillation

1 code implementation7 Oct 2020 Furkan Yesiler, Joan Serrà, Emilia Gómez

Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs).

Dimensionality Reduction Retrieval

SESQA: semi-supervised learning for speech quality assessment

no code implementations1 Oct 2020 Joan Serrà, Jordi Pons, Santiago Pascual

Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches.

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

1 code implementation28 Oct 2019 Furkan Yesiler, Joan Serrà, Emilia Gómez

The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece.

Cover song identification

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

3 code implementations NeurIPS 2019 Joan Serrà, Santiago Pascual, Carlos Segura

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

Audio Generation Voice Conversion

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

1 code implementation6 Apr 2019 Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.

Distant Speech Recognition

Towards Generalized Speech Enhancement with Generative Adversarial Networks

no code implementations6 Apr 2019 Santiago Pascual, Joan Serrà, Antonio Bonafonte

The speech enhancement task usually consists of removing additive noise or reverberation that partially mask spoken utterances, affecting their intelligibility.

Speech Enhancement

Training neural audio classifiers with few data

2 code implementations24 Oct 2018 Jordi Pons, Joan Serrà, Xavier Serra

We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.

Acoustic Scene Classification General Classification +2

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

3 code implementations31 Aug 2018 Santiago Pascual, Antonio Bonafonte, Joan Serrà, Jose A. Gonzalez

Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.

Speech Enhancement

Self-Attention Linguistic-Acoustic Decoder

no code implementations31 Aug 2018 Santiago Pascual, Antonio Bonafonte, Joan Serrà

The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks.

Speech Synthesis

Assessing the impact of machine intelligence on human behaviour: an interdisciplinary endeavour

no code implementations7 Jun 2018 Emilia Gómez, Carlos Castillo, Vicky Charisi, Verónica Dahl, Gustavo Deco, Blagoj Delipetrev, Nicole Dewandre, Miguel Ángel González-Ballester, Fabien Gouyon, José Hernández-Orallo, Perfecto Herrera, Anders Jonsson, Ansgar Koene, Martha Larson, Ramón López de Mántaras, Bertin Martens, Marius Miron, Rubén Moreno-Bote, Nuria Oliver, Antonio Puertas Gallardo, Heike Schweitzer, Nuria Sebastian, Xavier Serra, Joan Serrà, Songül Tolan, Karina Vold

The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs.

Decision Making

Towards a universal neural network encoder for time series

no code implementations10 May 2018 Joan Serrà, Santiago Pascual, Alexandros Karatzoglou

We evaluate the performance of the proposed approach on a well-known time series classification benchmark, considering full adaptation, partial adaptation, and no adaptation of the encoder to the new data type.

Time Series Analysis Time Series Classification

Overcoming catastrophic forgetting with hard attention to the task

2 code implementations ICML 2018 Joan Serrà, Dídac Surís, Marius Miron, Alexandros Karatzoglou

In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning.

Continual Learning Hard Attention

Continual Prediction of Notification Attendance with Classical and Deep Network Approaches

no code implementations19 Dec 2017 Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà

Besides using classical gradient-boosted trees, we demonstrate how to make continual predictions using a recurrent neural network (RNN).

Human-Computer Interaction

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

3 code implementations18 Dec 2017 Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn

In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.

Speech Enhancement

Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks

no code implementations13 Jun 2017 Joan Serrà, Alexandros Karatzoglou

Due to the structure of the data coming from recommendation domains (i. e., one-hot-encoded vectors of item preferences), these algorithms tend to have large input and output dimensionalities that dominate their overall size.

Hot or not? Forecasting cellular network hot spots using sector performance indicators

no code implementations18 Apr 2017 Joan Serrà, Ilias Leontiadis, Alexandros Karatzoglou, Konstantina Papagiannaki

Our results indicate that, compared to the best baseline, tree-based models can deliver up to 14% better forecasts for regular hot spots and 153% better forecasts for non-regular hot spots.

SEGAN: Speech Enhancement Generative Adversarial Network

20 code implementations28 Mar 2017 Santiago Pascual, Antonio Bonafonte, Joan Serrà

In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

Speech Enhancement

A genetic algorithm to discover flexible motifs with support

1 code implementation16 Nov 2015 Joan Serrà, Aleksandar Matic, Josep Luis Arcos, Alexandros Karatzoglou

Finding repeated patterns or motifs in a time series is an important unsupervised task that has still a number of open issues, starting by the definition of motif.

Time Series Analysis

Ranking and significance of variable-length similarity-based time series motifs

no code implementations6 Mar 2015 Joan Serrà, Isabel Serra, Álvaro Corral, Josep Lluis Arcos

Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency.

Time Series Analysis

Particle swarm optimization for time series motif discovery

no code implementations29 Jan 2015 Joan Serrà, Josep Lluis Arcos

In this article, we propose an innovative standpoint and present a solution coming from it: an anytime multimodal optimization algorithm for time series motif discovery based on particle swarms.

Time Series Streams

An Empirical Evaluation of Similarity Measures for Time Series Classification

no code implementations16 Jan 2014 Joan Serrà, Josep Lluis Arcos

In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems.

Classification Clustering +3

Cannot find the paper you are looking for? You can Submit a new open access paper.