Audio Generation

64 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Generation

Dataset	Best Model	Compare
AudioCaps	Audiobox	See all
Classical music, 5 seconds at 12 kHz	Sparse Transformer 152M (strided)	See all
Symphony music	SymphonyNet	See all

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

no code yet • 31 Jan 2024

The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns.

Paper
Add Code

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code yet • 14 Jan 2024

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Paper
Add Code

Masked Audio Generation using a Single Non-Autoregressive Transformer

no code yet • 9 Jan 2024

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.

Paper
Add Code

Efficient Parallel Audio Generation using Group Masked Language Modeling

no code yet • 2 Jan 2024

We present a fast and high-quality codec language model for parallel audio generation.

Paper
Add Code

Audiobox: Unified Audio Generation with Natural Language Prompts

no code yet • 25 Dec 2023

Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.

Paper
Add Code

Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models

no code yet • 24 Dec 2023

Denoising Diffusion Probabilistic Model (DDPM) has shown great competence in image and audio generation tasks.

Paper
Add Code

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

no code yet • 8 Dec 2023

We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio.

Paper
Add Code

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

no code yet • 4 Dec 2023

This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE).

Paper
Add Code

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models

no code yet • 24 Nov 2023

Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing.

Paper
Add Code

Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation

no code yet • 13 Nov 2023

To this end, a metric to measure the spatial perception of audio is proposed for the first time.

Paper
Add Code

Audio Generation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result