TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Sound Event Localization and Detection	PodcastFillers	AVC-FillerNet	event-based F1 score	92.8	# 1
Sound Event Localization and Detection	PodcastFillers	VC-FillerNet	event-based F1 score	71.0	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/filler-word-detection-and-classification-a/sound-event-localization-and-detection-on-2)](https://paperswithcode.com/sota/sound-event-localization-and-detection-on-2?p=filler-word-detection-and-classification-a)`

Filler Word Detection and Classification: A Dataset and Benchmark

28 Mar 2022 · Ge Zhu, Juan-Pablo Caceres, Justin Salamon ·

Filler words such as `uh' or `um' are sounds or words people use to signal they are pausing to think. Finding and removing filler words from recordings is a common and tedious task in media editing. Automatically detecting and classifying filler words could greatly aid in this task, but few studies have been published on this problem to date. A key reason is the absence of a dataset with annotated filler words for model training and evaluation. In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions. We propose a pipeline that leverages VAD and ASR to detect filler candidates and a classifier to distinguish between filler word types. We evaluate our proposed pipeline on PodcastFillers, compare to several baselines, and present a detailed ablation study. In particular, we evaluate the importance of using ASR and how it compares to a transcription-free approach resembling keyword spotting. We show that our pipeline obtains state-of-the-art results, and that leveraging ASR strongly outperforms a keyword spotting approach. We make PodcastFillers publicly available, in the hope that our work serves as a benchmark for future research.

PDF Abstract

Code

Add Remove Mark official

gzhu06/PodcastFillers_Utils official

Tasks

Add Remove

Classification

Keyword Spotting

Sound Event Localization and Detection

Datasets

Introduced in the Paper:

PodcastFillers

Used in the Paper:

LibriSpeech

AudioSet VCTK

Results from the Paper

Edit

Ranked #1 on Sound Event Localization and Detection on PodcastFillers

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Sound Event Localization and Detection	PodcastFillers	AVC-FillerNet	event-based F1 score	92.8	# 1		Compare
Sound Event Localization and Detection	PodcastFillers	VC-FillerNet	event-based F1 score	71.0	# 2		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Filler Word Detection and Classification: A Dataset and Benchmark

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove