Search Results for author: Juan Pablo Bello

Found 17 papers, 8 papers with code

Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

no code implementations17 Aug 2023 Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto

We show that our system, trained using our automatic data curation pipeline, significantly outperforms baselines trained on in-the-wild data on the task of HQ SFX retrieval for video.

Contrastive Learning Retrieval

FlowGrad: Using Motion for Visual Sound Source Localization

1 code implementation15 Nov 2022 Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes

Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos.

Optical Flow Estimation Scene Understanding

A Study on Robustness to Perturbations for Representations of Environmental Sound

no code implementations20 Mar 2022 Sangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff, Magdalena Fuentes, Mark Cartwright, Claudio Silva, Anish Arora, Juan Pablo Bello

To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent.

FAD Transfer Learning

Wav2CLIP: Learning Robust Audio Representations From CLIP

1 code implementation21 Oct 2021 Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP).

Cross-Modal Retrieval Image Generation +3

Soundata: A Python library for reproducible use of audio datasets

no code implementations26 Sep 2021 Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version.

Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes

1 code implementation6 May 2021 Aurora Cramer, Mark Cartwright, Fatemeh Pishdadian, Juan Pablo Bello

While the estimation of what sound sources are, when they occur, and from where they originate has been well-studied, the estimation of how loud these sound sources are has been often overlooked.

SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context

no code implementations11 Sep 2020 Mark Cartwright, Jason Cramer, Ana Elisa Mendez Mendez, Yu Wang, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie Mydlarz, Justin Salamon, Oded Nov, Juan Pablo Bello

In this article, we describe our data collection procedure and propose evaluation metrics for multilabel classification of urban sound tags.

One or Two Components? The Scattering Transform Answers

no code implementations2 Mar 2020 Vincent Lostanlen, Alice Cohen-Hadria, Juan Pablo Bello

With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network.

Vocal Bursts Valence Prediction

Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

no code implementations1 Nov 2019 Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello

This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN).

Noise Estimation speech-recognition +1

Learning the helix topology of musical pitch

1 code implementation22 Oct 2019 Vincent Lostanlen, Sripathi Sridhar, Brian McFee, Andrew Farnsworth, Juan Pablo Bello

To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively.

Adversarial Learning for Improved Onsets and Frames Music Transcription

no code implementations20 Jun 2019 Jong Wook Kim, Juan Pablo Bello

Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance.

Information Retrieval Music Information Retrieval +2

Robust sound event detection in bioacoustic sensor networks

1 code implementation20 May 2019 Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello

As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise.

Data Augmentation Event Detection +1

Neural Music Synthesis for Flexible Timbre Control

no code implementations1 Nov 2018 Jong Wook Kim, Rachel Bittner, Aparna Kumar, Juan Pablo Bello

The recent success of raw audio waveform synthesis models like WaveNet motivates a new approach for music synthesis, in which the entire process --- creating audio samples from a score and instrument information --- is modeled using generative neural networks.

Adaptive pooling operators for weakly labeled sound event detection

2 code implementations26 Apr 2018 Brian McFee, Justin Salamon, Juan Pablo Bello

In this work, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality.

Event Detection Multiple Instance Learning +2

CREPE: A Convolutional Representation for Pitch Estimation

1 code implementation17 Feb 2018 Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello

To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics.

Information Retrieval Music Information Retrieval +1

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

5 code implementations15 Aug 2016 Justin Salamon, Juan Pablo Bello

We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a "shallow" dictionary learning model with augmentation.

Ranked #6 on Environmental Sound Classification on UrbanSound8K (using extra training data)

Data Augmentation Dictionary Learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.