FAD

25 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find FAD models and implementations
2 papers
220
2 papers
73

Most implemented papers

Adapting Frechet Audio Distance for Generative Music Evaluation

microsoft/fadtk 2 Nov 2023

The growing popularity of generative music models underlines the need for perceptually relevant, objective music quality metrics.

Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance

microsoft/fadtk 23 Sep 2024

The complex nature of musical emotion introduces inherent bias in both recognition and generation, particularly when relying on a single audio encoder, emotion classifier, or evaluation metric.

Twitch Plays Pokemon, Machine Learns Twitch: Unsupervised Context-Aware Anomaly Detection for Identifying Trolls in Streaming Data

ahaque/twitch-troll-detection 17 Feb 2019

With the increasing importance of online communities, discussion forums, and customer reviews, Internet "trolls" have proliferated thereby making it difficult for information seekers to find relevant and correct information.

Representation Sharing for Fast Object Detector Search and Beyond

MalongTech/research-fad ECCV 2020

FAD consists of a designed search space and an efficient architecture search algorithm.

Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks

RussellSB/tt-vae-gan 5 Sep 2021

This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality.

Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

marcojira/stylegan3-melspectrograms 25 Jun 2022

We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition.

Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning

lzp870/rsfd 28 Nov 2022

In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens.

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

researchmm/mm-diffusion CVPR 2023

To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

biaofuxmu/fast 14 Mar 2023

A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints.

AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection

zhoujingchun03/amsp-uod 23 Aug 2023

In this paper, we present a novel Amplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UOD, designed for underwater object detection.