Audio Generation

64 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Latest papers with no code

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

no code yet • 31 Jan 2024

The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns.

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code yet • 14 Jan 2024

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Masked Audio Generation using a Single Non-Autoregressive Transformer

no code yet • 9 Jan 2024

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.

Efficient Parallel Audio Generation using Group Masked Language Modeling

no code yet • 2 Jan 2024

We present a fast and high-quality codec language model for parallel audio generation.

Audiobox: Unified Audio Generation with Natural Language Prompts

no code yet • 25 Dec 2023

Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.

Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models

no code yet • 24 Dec 2023

Denoising Diffusion Probabilistic Model (DDPM) has shown great competence in image and audio generation tasks.

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

no code yet • 8 Dec 2023

We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio.

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

no code yet • 4 Dec 2023

This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE).

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models

no code yet • 24 Nov 2023

Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing.

Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation

no code yet • 13 Nov 2023

To this end, a metric to measure the spatial perception of audio is proposed for the first time.