Cut and Learn for Unsupervised Object Detection and Instance Segmentation

facebookresearch/cutler 26 Jan 2023

We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models.

Instance Segmentation object-detection +2

232
1.65 stars / hour

Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion

archinetai/audio-diffusion-pytorch 27 Jan 2023

In our work, we investigate the potential of diffusion models for text-conditional music generation.

Image Generation Music Generation

917
1.46 stars / hour

On the Expressive Power of Geometric Graph Neural Networks

chaitjo/geometric-gnn-dojo 23 Jan 2023

The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test.

185
1.29 stars / hour

Learning the Beauty in Songs: Neural Singing Voice Beautifier

MoonInTheRiver/DiffSinger ACL 2022

Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.

Dynamic Time Warping

1,696
1.19 stars / hour

Text2LIVE: Text-Driven Layered Image and Video Editing

omerbt/Text2LIVE 5 Apr 2022

Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e. g., object's texture) or augment the scene with visual effects (e. g., smoke, fire) in a semantically meaningful manner.

455
1.17 stars / hour

InstructPix2Pix: Learning to Follow Image Editing Instructions

timothybrooks/instruct-pix2pix 17 Nov 2022

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.

Language Modelling Text-based Image Editing +1

2,924
1.15 stars / hour

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video 22 Dec 2022

To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning.

Style Transfer Text-to-Video Generation +1

371
1.06 stars / hour

SNAKE: Shape-aware Neural 3D Keypoint Field

zhongcl-thu/snake 3 Jun 2022

Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?

Keypoint Detection

192
1.01 stars / hour

VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and Uncertainty-Spectrum Modeling

air-discover/vibus 20 Oct 2022

In the first stage, we perform self-supervised representation learning on unlabeled points with the proposed Viewpoint Bottleneck loss function.

Representation Learning Scene Parsing

156
0.99 stars / hour

Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models

ezelikman/parsel 20 Dec 2022

Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs.

Automated Theorem Proving Code Generation +2

68
0.96 stars / hour