Audio Generation
63 papers with code • 3 benchmarks • 8 datasets
Audio generation (synthesis) is the task of generating raw audio such as speech.
( Image credit: MelNet )
Latest papers
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.
WavJourney: Compositional Audio Creation with Large Language Models
Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text.
High-Fidelity Audio Compression with Improved RVQGAN
Language models have been successfully used to model natural signals, such as images, speech, and music.
MuseCoco: Generating Symbolic Music from Text
In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.
An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task.
Any-to-Any Generation via Composable Diffusion
We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities.
SoundStorm: Efficient Parallel Audio Generation
We present SoundStorm, a model for efficient, non-autoregressive audio generation.
LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music
Music information retrieval (MIR) has gone through an explosive development with the advancement of deep learning in recent years.
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks.