Search Results for author: Eduardo Fonseca

Found 18 papers, 12 papers with code

Dataset balancing can hurt model performance

no code implementations30 Jun 2023 R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal

We find, however, that while balancing improves performance on the public AudioSet evaluation data it simultaneously hurts performance on an unpublished evaluation set collected under the same conditions.

Audiovisual Masked Autoencoders

2 code implementations ICCV 2023 Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab

Can we leverage the audiovisual information already present in video to improve self-supervised representation learning?

 Ranked #1 on Audio Classification on EPIC-KITCHENS-100 (using extra training data)

Audio Classification Representation Learning

Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

no code implementations14 Oct 2022 Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.

Event Segmentation

Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks

1 code implementation1 Jul 2021 Eduardo Fonseca, Andres Ferraro, Xavier Serra

Recent studies have put into question the commonly assumed shift invariance property of convolutional networks, showing that small shifts in the input can affect the output predictions substantially.

Self-Supervised Learning from Automatically Separated Sound Scenes

1 code implementation5 May 2021 Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings.

Contrastive Learning Self-Supervised Learning

Unsupervised Contrastive Learning of Sound Event Representations

1 code implementation15 Nov 2020 Eduardo Fonseca, Diego Ortego, Kevin McGuinness, Noel E. O'Connor, Xavier Serra

Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data---a common scenario in sound event research.

Contrastive Learning Representation Learning

What's All the FUSS About Free Universal Sound Separation Data?

no code implementations2 Nov 2020 Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types.

Data Augmentation

FSD50K: An Open Dataset of Human-Labeled Sound Events

8 code implementations1 Oct 2020 Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers

1 code implementation26 Oct 2019 Eduardo Fonseca, Frederic Font, Xavier Serra

We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2. 5\% of accuracy boost when incorporated to two different CNNs, while requiring minimal intervention and computational overhead.

General Classification

A hybrid parametric-deep learning approach for sound event localization and detection

1 code implementation27 Aug 2019 Andres Perez-Lopez, Eduardo Fonseca, Xavier Serra

This work describes and discusses an algorithm submitted to the Sound Event Localization and Detection Task of DCASE2019 Challenge.

Sound Event Localization and Detection

Audio tagging with noisy labels and minimal supervision

2 code implementations7 Jun 2019 Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra

The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes.

Audio Tagging Task 2

Learning Sound Event Classifiers from Web Audio with Noisy Labels

2 code implementations4 Jan 2019 Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra

To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

General Classification Sound Event Detection

Facilitating the Manual Annotation of Sounds When Using Large Taxonomies

no code implementations21 Nov 2018 Xavier Favory, Eduardo Fonseca, Frederic Font, Xavier Serra

It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections.

Information Retrieval Retrieval

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

3 code implementations26 Jul 2018 Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

Audio Tagging Task 2

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

2 code implementations19 Jun 2018 Eduardo Fonseca, Rong Gong, Xavier Serra

In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine.

Acoustic Scene Classification Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.