Audio Generation

65 papers with code • 3 benchmarks • 9 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Generation

Dataset	Best Model	Compare
AudioCaps	Audiobox	See all
Classical music, 5 seconds at 12 kHz	Sparse Transformer 152M (strided)	See all
Symphony music	SymphonyNet	See all

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

LooPy: A Research-Friendly Mix Framework for Music Information Retrieval on Electronic Dance Music

gariscat/loopy • 1 May 2023

Music information retrieval (MIR) has gone through an explosive development with the advancement of deep learning in recent years.

01 May 2023

Paper
Code

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

declare-lab/tango • • 24 Apr 2023

The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks.

926

24 Apr 2023

Paper
Code

Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert

serp-ai/bark-with-voice-clone • • Social Science Research Network (SSRN) 2023

Keywords: Bark, ai voice cloning, Suno, text-to-speech, artificial intelligence, audio generation, Meta's encodec, audio codebooks, semantic tokens, HuBert, transformer-based model, multilingual speech, wav2vec, linear projection head, embedding space, generative capabilities, pretrained model checkpoints

2,855

18 Apr 2023

Paper
Code

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

jwliu-cc/svg • 29 Mar 2023

In this work, we concentrate on a rarely investigated problem of text guided sounding video generation and propose the Sounding Video Generator (SVG), a unified framework for generating realistic videos along with audio signals.

29 Mar 2023

Paper
Code

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

aluo-x/learning_neural_acoustic_fields • • NeurIPS 2023

Can machines recording an audio-visual scene produce realistic, matching audio-visual experiences at novel positions and novel view directions?

114

04 Feb 2023

Paper
Code

ArchiSound: Audio Generation with Diffusion

archinetai/audio-diffusion-pytorch • • 30 Jan 2023

The recent surge in popularity of diffusion models for image generation has brought new attention to the potential of these models in other areas of media generation.

1,810

30 Jan 2023

Paper
Code

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

text-to-audio/make-an-audio • • 30 Jan 2023

Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.

696

30 Jan 2023

Paper
Code

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

huggingface/diffusers • • 29 Jan 2023

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

22,815

29 Jan 2023

Paper
Code

AudioGen: Textually Guided Audio Generation

facebookresearch/audiocraft • • 30 Sep 2022

Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally.

19,768

30 Sep 2022

Paper
Code

AudioLM: a Language Modeling Approach to Audio Generation

suno-ai/bark • • 7 Sep 2022

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.

32,855

07 Sep 2022

Paper
Code

Audio Generation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result