FAD
10 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in FAD
Most implemented papers
Adapting Frechet Audio Distance for Generative Music Evaluation
The growing popularity of generative music models underlines the need for perceptually relevant, objective music quality metrics.
Twitch Plays Pokemon, Machine Learns Twitch: Unsupervised Context-Aware Anomaly Detection for Identifying Trolls in Streaming Data
With the increasing importance of online communities, discussion forums, and customer reviews, Internet "trolls" have proliferated thereby making it difficult for information seekers to find relevant and correct information.
Representation Sharing for Fast Object Detector Search and Beyond
FAD consists of a designed search space and an efficient architecture search algorithm.
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality.
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition.
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens.
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.
Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference
A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints.
AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection
In this paper, we present a novel Amplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UOD, designed for underwater object detection.
Latent CLAP Loss for Better Foley Sound Synthesis
We introduce a new loss term to enhance Foley sound generation in AudioLDM without post-filtering.