Audio Generation
64 papers with code • 3 benchmarks • 8 datasets
Audio generation (synthesis) is the task of generating raw audio such as speech.
( Image credit: MelNet )
Latest papers
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM.
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.
WavJourney: Compositional Audio Creation with Large Language Models
Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.
High-Fidelity Audio Compression with Improved RVQGAN
Language models have been successfully used to model natural signals, such as images, speech, and music.
MuseCoco: Generating Symbolic Music from Text
In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.
An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task.
Any-to-Any Generation via Composable Diffusion
We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.
SoundStorm: Efficient Parallel Audio Generation
We present SoundStorm, a model for efficient, non-autoregressive audio generation.
LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music
Music information retrieval (MIR) has gone through an explosive development with the advancement of deep learning in recent years.