no code implementations • 26 Oct 2022 • Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà
Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds.
no code implementations • 21 Oct 2022 • Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà
Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so.
no code implementations • 7 Jun 2022 • Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini
We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.
no code implementations • 16 Feb 2022 • Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà
We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation.
no code implementations • 23 Nov 2021 • Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini
Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling.
no code implementations • 8 Apr 2021 • Joan Serrà, Santiago Pascual, Jordi Pons
Score-based generative models provide state-of-the-art quality for image and audio synthesis.
1 code implementation • 27 Oct 2020 • Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà
We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.
1 code implementation • 20 Oct 2020 • Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà
Applications of deep learning to automatic multitrack mixing are largely unexplored.
Audio and Speech Processing Sound
no code implementations • 1 Oct 2020 • Joan Serrà, Jordi Pons, Santiago Pascual
Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches.
1 code implementation • 25 Jan 2020 • Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
1 code implementation • 15 Nov 2019 • Tina Raissi, Santiago Pascual, Maurizio Omologo
The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step.
Sound Audio and Speech Processing I.2.7
2 code implementations • 3 Jun 2019 • David Álvarez, Santiago Pascual, Antonio Bonafonte
This way we feed the acoustic model with speaker acoustically dependent representations that enrich the waveform generation more than discrete embeddings unrelated to these factors.
Sound Audio and Speech Processing
3 code implementations • NeurIPS 2019 • Joan Serrà, Santiago Pascual, Carlos Segura
End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.
1 code implementation • 6 Apr 2019 • Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio
Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.
Ranked #2 on
Distant Speech Recognition
on DIRHA English WSJ
no code implementations • 6 Apr 2019 • Santiago Pascual, Joan Serrà, Antonio Bonafonte
The speech enhancement task usually consists of removing additive noise or reverberation that partially mask spoken utterances, affecting their intelligibility.
3 code implementations • 25 Mar 2019 • Amanda Duarte, Francisco Roldan, Miquel Tubau, Janna Escur, Santiago Pascual, Amaia Salvador, Eva Mohedano, Kevin McGuinness, Jordi Torres, Xavier Giro-i-Nieto
Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker.
no code implementations • 31 Aug 2018 • Santiago Pascual, Antonio Bonafonte, Joan Serrà
The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks.
3 code implementations • 31 Aug 2018 • Santiago Pascual, Antonio Bonafonte, Joan Serrà, Jose A. Gonzalez
Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.
no code implementations • 10 May 2018 • Joan Serrà, Santiago Pascual, Alexandros Karatzoglou
We evaluate the performance of the proposed approach on a well-known time series classification benchmark, considering full adaptation, partial adaptation, and no adaptation of the encoder to the new data type.
3 code implementations • 18 Dec 2017 • Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn
In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.
20 code implementations • 28 Mar 2017 • Santiago Pascual, Antonio Bonafonte, Joan Serrà
In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
3 code implementations • 29 Aug 2016 • Alberto Montes, Amaia Salvador, Santiago Pascual, Xavier Giro-i-Nieto
This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed.