Search Results for author: Santiago Pascual

Found 26 papers, 13 papers with code

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

3 code implementations29 Aug 2016 Alberto Montes, Amaia Salvador, Santiago Pascual, Xavier Giro-i-Nieto

This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed.

Action Detection Activity Detection

SEGAN: Speech Enhancement Generative Adversarial Network

20 code implementations28 Mar 2017 Santiago Pascual, Antonio Bonafonte, Joan Serrà

In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

Generative Adversarial Network Speech Enhancement

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

3 code implementations18 Dec 2017 Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn

In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.

Generative Adversarial Network Speech Enhancement

Towards a universal neural network encoder for time series

no code implementations10 May 2018 Joan Serrà, Santiago Pascual, Alexandros Karatzoglou

We evaluate the performance of the proposed approach on a well-known time series classification benchmark, considering full adaptation, partial adaptation, and no adaptation of the encoder to the new data type.

Time Series Time Series Analysis +1

Self-Attention Linguistic-Acoustic Decoder

no code implementations31 Aug 2018 Santiago Pascual, Antonio Bonafonte, Joan Serrà

The conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models like recurrent neural networks.

Speech Synthesis

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

3 code implementations31 Aug 2018 Santiago Pascual, Antonio Bonafonte, Joan Serrà, Jose A. Gonzalez

Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.

Speech Enhancement

Towards Generalized Speech Enhancement with Generative Adversarial Networks

no code implementations6 Apr 2019 Santiago Pascual, Joan Serrà, Antonio Bonafonte

The speech enhancement task usually consists of removing additive noise or reverberation that partially mask spoken utterances, affecting their intelligibility.

Generative Adversarial Network Speech Enhancement

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

1 code implementation6 Apr 2019 Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure.

Distant Speech Recognition

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

3 code implementations NeurIPS 2019 Joan Serrà, Santiago Pascual, Carlos Segura

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

Audio Generation Voice Conversion

Problem-Agnostic Speech Embeddings for Multi-Speaker Text-to-Speech with SampleRNN

2 code implementations3 Jun 2019 David Álvarez, Santiago Pascual, Antonio Bonafonte

This way we feed the acoustic model with speaker acoustically dependent representations that enrich the waveform generation more than discrete embeddings unrelated to these factors.

Sound Audio and Speech Processing

Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space

1 code implementation15 Nov 2019 Tina Raissi, Santiago Pascual, Maurizio Omologo

The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step.

Sound Audio and Speech Processing I.2.7

Multi-task self-supervised learning for Robust Speech Recognition

1 code implementation25 Jan 2020 Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.

Robust Speech Recognition Self-Supervised Learning +1

SESQA: semi-supervised learning for speech quality assessment

no code implementations1 Oct 2020 Joan Serrà, Jordi Pons, Santiago Pascual

Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches.

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

1 code implementation20 Oct 2020 Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà

Applications of deep learning to automatic multitrack mixing are largely unexplored.

Audio and Speech Processing Sound

Upsampling artifacts in neural audio synthesis

1 code implementation27 Oct 2020 Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà

We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

Audio Signal Processing Audio Synthesis

On tuning consistent annealed sampling for denoising score matching

no code implementations8 Apr 2021 Joan Serrà, Santiago Pascual, Jordi Pons

Score-based generative models provide state-of-the-art quality for image and audio synthesis.

Audio Synthesis Denoising

Upsampling layers for music source separation

no code implementations23 Nov 2021 Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini

Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling.

Music Source Separation

On loss functions and evaluation metrics for music source separation

no code implementations16 Feb 2022 Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà

We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation.

Audio Source Separation Benchmarking +1

Universal Speech Enhancement with Score-based Diffusion

no code implementations7 Jun 2022 Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini

We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.

Speech Enhancement

Adversarial Permutation Invariant Training for Universal Sound Separation

no code implementations21 Oct 2022 Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà

Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so.

Full-band General Audio Synthesis with Score-based Diffusion

no code implementations26 Oct 2022 Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà

Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds.

Audio Synthesis

Mono-to-stereo through parametric stereo generation

no code implementations26 Jun 2023 Joan Serrà, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi Pons, Jeroen Breebaart, Giulio Cengarle

Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements.

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

1 code implementation18 Aug 2023 Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai

In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM.

Audio Generation

GASS: Generalizing Audio Source Separation with Large-scale Data

no code implementations29 Sep 2023 Jordi Pons, Xiaoyu Liu, Santiago Pascual, Joan Serrà

Here, we study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.

Audio Source Separation Speech Separation

Cannot find the paper you are looking for? You can Submit a new open access paper.