no code implementations • 17 Aug 2023 • Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto
We show that our system, trained using our automatic data curation pipeline, significantly outperforms baselines trained on in-the-wild data on the task of HQ SFX retrieval for video.
1 code implementation • 15 Nov 2022 • Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes
Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos.
no code implementations • 20 Mar 2022 • Sangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff, Magdalena Fuentes, Mark Cartwright, Claudio Silva, Anish Arora, Juan Pablo Bello
To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent.
1 code implementation • 21 Oct 2021 • Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello
We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP).
no code implementations • 26 Sep 2021 • Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello
Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version.
1 code implementation • 6 May 2021 • Aurora Cramer, Mark Cartwright, Fatemeh Pishdadian, Juan Pablo Bello
While the estimation of what sound sources are, when they occur, and from where they originate has been well-studied, the estimation of how loud these sound sources are has been often overlooked.
no code implementations • 5 Feb 2021 • Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang
Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
no code implementations • 11 Sep 2020 • Mark Cartwright, Jason Cramer, Ana Elisa Mendez Mendez, Yu Wang, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie Mydlarz, Justin Salamon, Oded Nov, Juan Pablo Bello
In this article, we describe our data collection procedure and propose evaluation metrics for multilabel classification of urban sound tags.
no code implementations • 2 Mar 2020 • Vincent Lostanlen, Alice Cohen-Hadria, Juan Pablo Bello
With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network.
no code implementations • 1 Nov 2019 • Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello
This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN).
1 code implementation • 22 Oct 2019 • Vincent Lostanlen, Sripathi Sridhar, Brian McFee, Andrew Farnsworth, Juan Pablo Bello
To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively.
no code implementations • 20 Jun 2019 • Jong Wook Kim, Juan Pablo Bello
Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance.
1 code implementation • 20 May 2019 • Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello
As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise.
no code implementations • 1 Nov 2018 • Jong Wook Kim, Rachel Bittner, Aparna Kumar, Juan Pablo Bello
The recent success of raw audio waveform synthesis models like WaveNet motivates a new approach for music synthesis, in which the entire process --- creating audio samples from a score and instrument information --- is modeled using generative neural networks.
2 code implementations • 26 Apr 2018 • Brian McFee, Justin Salamon, Juan Pablo Bello
In this work, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality.
1 code implementation • 17 Feb 2018 • Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello
To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics.
5 code implementations • 15 Aug 2016 • Justin Salamon, Juan Pablo Bello
We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a "shallow" dictionary learning model with augmentation.
Ranked #6 on Environmental Sound Classification on UrbanSound8K (using extra training data)