Most implemented papers

Adapting Frechet Audio Distance for Generative Music Evaluation

microsoft/fadtk 2 Nov 2023

The growing popularity of generative music models underlines the need for perceptually relevant, objective music quality metrics.

Twitch Plays Pokemon, Machine Learns Twitch: Unsupervised Context-Aware Anomaly Detection for Identifying Trolls in Streaming Data

ahaque/twitch-troll-detection 17 Feb 2019

With the increasing importance of online communities, discussion forums, and customer reviews, Internet "trolls" have proliferated thereby making it difficult for information seekers to find relevant and correct information.

Representation Sharing for Fast Object Detector Search and Beyond

MalongTech/research-fad ECCV 2020

FAD consists of a designed search space and an efficient architecture search algorithm.

Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks

RussellSB/tt-vae-gan 5 Sep 2021

This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality.

Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

marcojira/stylegan3-melspectrograms 25 Jun 2022

We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition.

Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning

lzp870/rsfd 28 Nov 2022

In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens.

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

researchmm/mm-diffusion CVPR 2023

To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

biaofuxmu/fast 14 Mar 2023

A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints.

AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection

zhoujingchun03/amsp-uod 23 Aug 2023

In this paper, we present a novel Amplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UOD, designed for underwater object detection.

Latent CLAP Loss for Better Foley Sound Synthesis

karchkha/latent-clap-subjective-evaluation 18 Mar 2024

We introduce a new loss term to enhance Foley sound generation in AudioLDM without post-filtering.