Audio Generation

63 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

haoheliu/AudioLDM2 10 Aug 2023

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

2,032
10 Aug 2023

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

retrocirce/musicldm 3 Aug 2023

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.

114
03 Aug 2023

WavJourney: Compositional Audio Creation with Large Language Models

audio-agi/wavjourney 26 Jul 2023

Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.

502
26 Jul 2023

High-Fidelity Audio Compression with Improved RVQGAN

descriptinc/descript-audio-codec NeurIPS 2023

Language models have been successfully used to model natural signals, such as images, speech, and music.

834
11 Jun 2023

MuseCoco: Generating Symbolic Music from Text

microsoft/muzic 31 May 2023

In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.

4,181
31 May 2023

An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization

kong13661/pia 26 May 2023

Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task.

4
26 May 2023

Any-to-Any Generation via Composable Diffusion

microsoft/i-Code NeurIPS 2023

We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.

1,632
19 May 2023

SoundStorm: Efficient Parallel Audio Generation

lucidrains/soundstorm-pytorch 16 May 2023

We present SoundStorm, a model for efficient, non-autoregressive audio generation.

1,110
16 May 2023

LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music

gariscat/loopy 1 May 2023

Music information retrieval (MIR) has gone through an explosive development with the advancement of deep learning in recent years.

22
01 May 2023

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

declare-lab/tango 24 Apr 2023

The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks.

901
24 Apr 2023