AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

haoheliu/audioldm_eval 29 Jan 2023

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

Audio Generation Style Transfer

Towards Robust Blind Face Restoration with Codebook Lookup Transformer

sczhou/codeformer 22 Jun 2022

In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces.

Blind Face Restoration

Cut and Learn for Unsupervised Object Detection and Instance Segmentation

facebookresearch/cutler 26 Jan 2023

We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models.

Instance Segmentation object-detection +2

Learning the Beauty in Songs: Neural Singing Voice Beautifier

MoonInTheRiver/DiffSinger ACL 2022

Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.

Dynamic Time Warping

LogAI: A Library for Log Analytics and Intelligence

salesforce/logai 31 Jan 2023

In order to enable users to perform multiple types of AI-based log analysis tasks in a uniform manner, we introduce LogAI (https://github. com/salesforce/logai), a one-stop open source library for log analytics and intelligence.

Anomaly Detection Log Parsing +2

DAMO-YOLO : A Report on Real-Time Object Detection Design

tinyvision/damo-yolo 23 Nov 2022

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.

Neural Architecture Search object-detection +1

A Length-Extrapolatable Transformer

microsoft/torchscale 20 Dec 2022

Position modeling plays a critical role in Transformers.

Language Modelling

Disentangling Random and Cyclic Effects in Time-Lapse Sequences

harskish/tlgan 4 Jul 2022

We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal.

PADL: Language-Directed Physics-Based Character Control

nv-tlabs/padl 31 Jan 2023

In this work, we present PADL, which leverages recent innovations in NLP in order to take steps towards developing language-directed controllers for physics-based character animation.

Image Generation Imitation Learning +3

Cross-domain Neural Pitch and Periodicity Estimation

interactiveaudiolab/penn 28 Jan 2023

Pitch is a foundational aspect of our perception of audio signals.

Music Transcription

