no code implementations • 8 Mar 2022 • Olga Slizovskaia, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux
Existing systems for sound event localization and detection (SELD) typically operate by estimating a source location for all classes at every time instant.
1 code implementation • 14 Jun 2020 • Juan F. Montesinos, Olga Slizovskaia, Gloria Haro
In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual selfsupervised task.
Audio Source Separation Audio-Visual Synchronization +1 Audio and Speech Processing Databases Sound
1 code implementation • 8 Apr 2020 • Olga Slizovskaia, Gloria Haro, Emilia Gómez
In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated.
Sound Audio and Speech Processing
no code implementations • 6 Apr 2020 • Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen
Both acoustic and visual information influence human perception of speech.
1 code implementation • ICLR 2020 • Joan Serrà, David Álvarez, Vicenç Gómez, Olga Slizovskaia, José F. Núñez, Jordi Luque
Likelihood-based generative models are a promising resource to detect out-of-distribution (OOD) inputs which could compromise the robustness or reliability of a machine learning system.
Ranked #10 on Anomaly Detection on Unlabeled CIFAR-10 vs CIFAR-100
no code implementations • 3 Jul 2019 • Olga Slizovskaia, Emilia Gómez, Gloria Haro
We also propose a technique for measuring the similarity between activation maps and audio features which typically presented in the form of a matrix, such as chromagrams or spectrograms.
no code implementations • 5 Nov 2018 • Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gomez
Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model?
3 code implementations • 20 Mar 2017 • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra
The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.
Sound