Audio Generation

63 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Generation

Dataset	Best Model	Compare
AudioCaps	Audiobox	See all
Classical music, 5 seconds at 12 kHz	Sparse Transformer 152M (strided)	See all
Symphony music	SymphonyNet	See all

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

haoheliu/AudioLDM2 • • 10 Aug 2023

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

2,032

10 Aug 2023

Paper
Code

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

retrocirce/musicldm • • 3 Aug 2023

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.

114

03 Aug 2023

Paper
Code

WavJourney: Compositional Audio Creation with Large Language Models

audio-agi/wavjourney • • 26 Jul 2023

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

502

26 Jul 2023

Paper
Code

High-Fidelity Audio Compression with Improved RVQGAN

descriptinc/descript-audio-codec • • NeurIPS 2023

Language models have been successfully used to model natural signals, such as images, speech, and music.

834

11 Jun 2023

Paper
Code

MuseCoco: Generating Symbolic Music from Text

microsoft/muzic • • 31 May 2023

In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.

4,181

31 May 2023

Paper
Code

An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization

kong13661/pia • • 26 May 2023

Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task.

26 May 2023

Paper
Code

Any-to-Any Generation via Composable Diffusion

microsoft/i-Code • • NeurIPS 2023

We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.

1,632

19 May 2023

Paper
Code

SoundStorm: Efficient Parallel Audio Generation

lucidrains/soundstorm-pytorch • • 16 May 2023

We present SoundStorm, a model for efficient, non-autoregressive audio generation.

1,110

16 May 2023

Paper
Code

LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music

gariscat/loopy • 1 May 2023

Music information retrieval (MIR) has gone through an explosive development with the advancement of deep learning in recent years.

01 May 2023

Paper
Code

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

declare-lab/tango • • 24 Apr 2023

The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks.

901

24 Apr 2023

Paper
Code

Audio Generation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result