HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

yuval-alaluf/hyperstyle 30 Nov 2021

In this work, we introduce this approach into the realm of encoder-based inversion.


NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

microsoft/nuwa 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Text-to-Image Generation Video Generation +1

Vector Quantized Diffusion Model for Text-to-Image Synthesis

microsoft/vq-diffusion 29 Nov 2021

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

Denoising Text-to-Image Generation

End-to-End Referring Video Object Segmentation with Multimodal Transformers

mttr2021/MTTR 29 Nov 2021

Due to the complex nature of this multimodal task, which combines text reasoning, video understanding, instance segmentation and tracking, existing approaches typically rely on sophisticated pipelines in order to tackle it.

Referring Expression Segmentation Semantic Segmentation +3

OpenUE: An Open Toolkit of Universal Extraction from Text

zjunlp/openue EMNLP 2020

We introduce a prototype model and provide an open-source and extensible toolkit called OpenUE for various extraction tasks.

Event Extraction Intent Detection

Semi-supervised Implicit Scene Completion from Sparse LiDAR

open-air-sun/sisc 29 Nov 2021

Recent advances show that semi-supervised implicit representation learning can be achieved through physical constraints like Eikonal equations.

Representation Learning

The Devil is the Classifier: Investigating Long Tail Relation Classification with Decoupling Analysis

zjunlp/deepke 15 Sep 2020

Long-tailed relation classification is a challenging problem as the head classes may dominate the training phase, thereby leading to the deterioration of the tail performance.

General Classification Relation Classification

Masked Autoencoders Are Scalable Vision Learners

pengzhiliang/MAE-pytorch 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Object Detection Self-Supervised Image Classification +2

WantWords: An Open-source Online Reverse Dictionary System

thunlp/WantWords EMNLP 2020

A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions.

