Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video 22 Dec 2022

To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning.

Style Transfer Text-to-Video Generation +1

434
1.67 stars / hour

Cut and Learn for Unsupervised Object Detection and Instance Segmentation

facebookresearch/cutler 26 Jan 2023

We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models.

Instance Segmentation object-detection +2

260
1.52 stars / hour

On the Expressive Power of Geometric Graph Neural Networks

chaitjo/geometric-gnn-dojo 23 Jan 2023

The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test.

194
1.40 stars / hour

ArchiSound: Audio Generation with Diffusion

archinetai/audio-diffusion-pytorch 30 Jan 2023

The recent surge in popularity of diffusion models for image generation has brought new attention to the potential of these models in other areas of media generation.

Audio Generation Image Generation

951
1.33 stars / hour

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

haoheliu/audioldm_eval 29 Jan 2023

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

Audio Generation Style Transfer

55
1.29 stars / hour

Learning the Beauty in Songs: Neural Singing Voice Beautifier

MoonInTheRiver/DiffSinger ACL 2022

Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.

Dynamic Time Warping

1,742
1.07 stars / hour

InstructPix2Pix: Learning to Follow Image Editing Instructions

timothybrooks/instruct-pix2pix 17 Nov 2022

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.

Language Modelling Text-based Image Editing +1

3,002
1.01 stars / hour

Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models

ezelikman/parsel 20 Dec 2022

Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs.

Automated Theorem Proving Code Generation +2

81
0.95 stars / hour

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

microsoft/biogpt 19 Oct 2022

Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain.

Document Classification Language Modelling +3

373
0.85 stars / hour

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

salesforce/lavis 30 Jan 2023

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Image Captioning Image Retrieval +6

1,630
0.78 stars / hour