TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Audio Generation	AudioCaps	Consistency TTA (Single-step generation)	FAD	2.18	# 10
Audio Generation	AudioCaps	Consistency TTA (Single-step generation)	FD	20.44	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/accelerating-diffusion-based-text-to-audio/audio-generation-on-audiocaps)](https://paperswithcode.com/sota/audio-generation-on-audiocaps?p=accelerating-diffusion-based-text-to-audio)`

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

19 Sep 2023 · Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi ·

Diffusion models power a vast majority of text-to-audio (TTA) generation methods. Unfortunately, these models suffer from slow inference speed due to iterative queries to the underlying denoising network, thus unsuitable for scenarios with inference time or computational constraints. This work modifies the recently proposed consistency distillation framework to train TTA models that require only a single neural network query. In addition to incorporating classifier-free guidance into the distillation process, we leverage the availability of generated audio during distillation training to fine-tune the consistency TTA model with novel loss functions in the audio space, such as the CLAP score. Our objective and subjective evaluation results on the AudioCaps dataset show that consistency models retain diffusion models' high generation quality and diversity while reducing the number of queries by a factor of 400.

PDF Abstract

Code

Add Remove Mark official

Bai-YT/ConsistencyTTA official

Tasks

Add Remove

AudioCaps

Audio Generation

Denoising

Datasets

AudioSet

AudioCaps

Results from the Paper

Edit

Ranked #10 on Audio Generation on AudioCaps

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Audio Generation	AudioCaps	Consistency TTA (Single-step generation)	FAD	2.18	# 10		Compare
Audio Generation	AudioCaps	Consistency TTA (Single-step generation)	FD	20.44	# 5		Compare

Methods

Add Remove

Diffusion • SPEED

Edit Social Preview

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove