Search Results for author: Daniel P. W. Ellis

Found 15 papers, 8 papers with code

Dataset balancing can hurt model performance

no code implementations30 Jun 2023 R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal

We find, however, that while balancing improves performance on the public AudioSet evaluation data it simultaneously hurts performance on an unpublished evaluation set collected under the same conditions.

Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

no code implementations14 Oct 2022 Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset.

Event Segmentation

MuLan: A Joint Embedding of Music Audio and Natural Language

1 code implementation26 Aug 2022 Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel P. W. Ellis

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries.

Cross-Modal Retrieval Music Tagging +2

Self-Supervised Learning from Automatically Separated Sound Scenes

1 code implementation5 May 2021 Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings.

Contrastive Learning Self-Supervised Learning

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

no code implementations ICLR 2021 Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

For evaluation and semi-supervised experiments, we collected human labels for presence of on-screen and off-screen sounds on a small subset of clips.

Scene Understanding

Improving Universal Sound Separation Using Sound Classification

no code implementations18 Nov 2019 Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis

Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification.

Audio Source Separation Classification +2

Audio tagging with noisy labels and minimal supervision

2 code implementations7 Jun 2019 Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra

The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sound classes.

Audio Tagging Task 2

Learning Sound Event Classifiers from Web Audio with Noisy Labels

2 code implementations4 Jan 2019 Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra

To foster the investigation of label noise in sound event classification we present FSDnoisy18k, a dataset containing 42. 5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

General Classification Sound Event Detection

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

1 code implementation2 Aug 2018 Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization.

Sound Audio and Speech Processing

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

3 code implementations26 Jul 2018 Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

Audio Tagging Task 2

Unsupervised Learning of Semantic Audio Representations

no code implementations6 Nov 2017 Aren Jansen, Manoj Plakal, Ratheet Pandya, Daniel P. W. Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Rif A. Saurous

Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds.

Audio Classification General Classification +1

Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems

5 code implementations29 Dec 2015 Colin Raffel, Daniel P. W. Ellis

We propose a simplified model of attention which is applicable to feed-forward neural networks and demonstrate that the resulting model can solve the synthetic "addition" and "multiplication" long-term memory problems for sequence lengths which are both longer and more widely varying than the best published results for these tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.