FAD
25 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in FAD
Libraries
Use these libraries to find FAD models and implementationsMost implemented papers
Adapting Frechet Audio Distance for Generative Music Evaluation
The growing popularity of generative music models underlines the need for perceptually relevant, objective music quality metrics.
Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance
The complex nature of musical emotion introduces inherent bias in both recognition and generation, particularly when relying on a single audio encoder, emotion classifier, or evaluation metric.
Twitch Plays Pokemon, Machine Learns Twitch: Unsupervised Context-Aware Anomaly Detection for Identifying Trolls in Streaming Data
With the increasing importance of online communities, discussion forums, and customer reviews, Internet "trolls" have proliferated thereby making it difficult for information seekers to find relevant and correct information.
Representation Sharing for Fast Object Detector Search and Beyond
FAD consists of a designed search space and an efficient architecture search algorithm.
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality.
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition.
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens.
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.
Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference
A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints.
AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection
In this paper, we present a novel Amplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UOD, designed for underwater object detection.