Audio Generation

64 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Generation

Dataset	Best Model	Compare
AudioCaps	Audiobox	See all
Classical music, 5 seconds at 12 kHz	Sparse Transformer 152M (strided)	See all
Symphony music	SymphonyNet	See all

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

heng-hw/V2A-Mapper • 18 Aug 2023

In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM.

18 Aug 2023

Paper
Code

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

haoheliu/AudioLDM2 • • 10 Aug 2023

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

2,049

10 Aug 2023

Paper
Code

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

retrocirce/musicldm • • 3 Aug 2023

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.

127

03 Aug 2023

Paper
Code

WavJourney: Compositional Audio Creation with Large Language Models

audio-agi/wavjourney • • 26 Jul 2023

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

505

26 Jul 2023

Paper
Code

High-Fidelity Audio Compression with Improved RVQGAN

descriptinc/descript-audio-codec • • NeurIPS 2023

Language models have been successfully used to model natural signals, such as images, speech, and music.

856

11 Jun 2023

Paper
Code

MuseCoco: Generating Symbolic Music from Text

microsoft/muzic • • 31 May 2023

In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.

4,194

31 May 2023

Paper
Code

An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization

kong13661/pia • • 26 May 2023

Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task.

26 May 2023

Paper
Code

Any-to-Any Generation via Composable Diffusion

microsoft/i-Code • • NeurIPS 2023

We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.

1,634

19 May 2023

Paper
Code

SoundStorm: Efficient Parallel Audio Generation

lucidrains/soundstorm-pytorch • • 16 May 2023

We present SoundStorm, a model for efficient, non-autoregressive audio generation.

1,115

16 May 2023

Paper
Code

LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music

gariscat/loopy • 1 May 2023

Music information retrieval (MIR) has gone through an explosive development with the advancement of deep learning in recent years.

01 May 2023

Paper
Code

Audio Generation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result