no code implementations • 26 Oct 2022 • Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà
Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds.
no code implementations • 23 Oct 2022 • Xiaoyu Liu, Xu Li, Joan Serrà
Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a mixture of multiple talkers given an enrollment utterance of that speaker.
no code implementations • 21 Oct 2022 • Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà
Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so.
no code implementations • 7 Jun 2022 • Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini
We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.
no code implementations • 16 Feb 2022 • Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà
We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation.
no code implementations • 23 Nov 2021 • Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini
Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling.
no code implementations • 30 Sep 2021 • Furkan Yesiler, Marius Miron, Joan Serrà, Emilia Gómez
Version identification (VI) systems now offer accurate and scalable solutions for detecting different renditions of a musical composition, allowing the use of these systems in industrial applications and throughout the wider music ecosystem.
no code implementations • 8 Apr 2021 • Joan Serrà, Santiago Pascual, Jordi Pons
Score-based generative models provide state-of-the-art quality for image and audio synthesis.
no code implementations • 6 Jan 2021 • Furkan Yesiler, Emilio Molina, Joan Serrà, Emilia Gómez
The setlist identification (SLI) task addresses a music recognition use case where the goal is to retrieve the metadata and timestamps for all the tracks played in live music events.
1 code implementation • 27 Oct 2020 • Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà
We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.
1 code implementation • 20 Oct 2020 • Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà
Applications of deep learning to automatic multitrack mixing are largely unexplored.
Audio and Speech Processing Sound
1 code implementation • 7 Oct 2020 • Furkan Yesiler, Joan Serrà, Emilia Gómez
Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs).
no code implementations • 1 Oct 2020 • Joan Serrà, Jordi Pons, Santiago Pascual
Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches.
1 code implementation • 28 Oct 2019 • Furkan Yesiler, Joan Serrà, Emilia Gómez
The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece.
Ranked #4 on
Cover song identification
on YouTube350
2 code implementations • ICLR 2020 • Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F. Núñez, Jordi Luque
Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system.
Ranked #10 on
Anomaly Detection
on Unlabeled CIFAR-10 vs CIFAR-100
3 code implementations • NeurIPS 2019 • Joan Serrà, Santiago Pascual, Carlos Segura
End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.
1 code implementation • 6 Apr 2019 • Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio
Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.
Ranked #2 on
Distant Speech Recognition
on DIRHA English WSJ
no code implementations • 6 Apr 2019 • Santiago Pascual, Joan Serrà, Antonio Bonafonte
The speech enhancement task usually consists of removing additive noise or reverberation that partially mask spoken utterances, affecting their intelligibility.
2 code implementations • 24 Oct 2018 • Jordi Pons, Joan Serrà, Xavier Serra
We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.
3 code implementations • 31 Aug 2018 • Santiago Pascual, Antonio Bonafonte, Joan Serrà, Jose A. Gonzalez
Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.
no code implementations • 31 Aug 2018 • Santiago Pascual, Antonio Bonafonte, Joan Serrà
The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks.
no code implementations • 7 Jun 2018 • Emilia Gómez, Carlos Castillo, Vicky Charisi, Verónica Dahl, Gustavo Deco, Blagoj Delipetrev, Nicole Dewandre, Miguel Ángel González-Ballester, Fabien Gouyon, José Hernández-Orallo, Perfecto Herrera, Anders Jonsson, Ansgar Koene, Martha Larson, Ramón López de Mántaras, Bertin Martens, Marius Miron, Rubén Moreno-Bote, Nuria Oliver, Antonio Puertas Gallardo, Heike Schweitzer, Nuria Sebastian, Xavier Serra, Joan Serrà, Songül Tolan, Karina Vold
The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs.
no code implementations • 10 May 2018 • Joan Serrà, Santiago Pascual, Alexandros Karatzoglou
We evaluate the performance of the proposed approach on a well-known time series classification benchmark, considering full adaptation, partial adaptation, and no adaptation of the encoder to the new data type.
2 code implementations • ICML 2018 • Joan Serrà, Dídac Surís, Marius Miron, Alexandros Karatzoglou
In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning.
Ranked #2 on
Continual Learning
on 20Newsgroup (10 tasks)
no code implementations • 19 Dec 2017 • Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà
Besides using classical gradient-boosted trees, we demonstrate how to make continual predictions using a recurrent neural network (RNN).
Human-Computer Interaction
3 code implementations • 18 Dec 2017 • Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn
In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.
no code implementations • 13 Jun 2017 • Joan Serrà, Alexandros Karatzoglou
Due to the structure of the data coming from recommendation domains (i. e., one-hot-encoded vectors of item preferences), these algorithms tend to have large input and output dimensionalities that dominate their overall size.
no code implementations • 17 May 2017 • Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà
We present a practical approach for processing mobile sensor time series data for continual deep learning predictions.
no code implementations • 18 Apr 2017 • Joan Serrà, Ilias Leontiadis, Alexandros Karatzoglou, Konstantina Papagiannaki
Our results indicate that, compared to the best baseline, tree-based models can deliver up to 14% better forecasts for regular hot spots and 153% better forecasts for non-regular hot spots.
20 code implementations • 28 Mar 2017 • Santiago Pascual, Antonio Bonafonte, Joan Serrà
In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
1 code implementation • 16 Nov 2015 • Joan Serrà, Aleksandar Matic, Josep Luis Arcos, Alexandros Karatzoglou
Finding repeated patterns or motifs in a time series is an important unsupervised task that has still a number of open issues, starting by the definition of motif.
no code implementations • 6 Mar 2015 • Joan Serrà, Isabel Serra, Álvaro Corral, Josep Lluis Arcos
Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency.
no code implementations • 29 Jan 2015 • Joan Serrà, Josep Lluis Arcos
In this article, we propose an innovative standpoint and present a solution coming from it: an anytime multimodal optimization algorithm for time series motif discovery based on particle swarms.
no code implementations • 16 Jan 2014 • Joan Serrà, Josep Lluis Arcos
In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems.