NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

microsoft/nuwa 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Text-to-Image Generation Video Generation +1

529
5.15 stars / hour

MetaFormer is Actually What You Need for Vision

sail-sg/poolformer 22 Nov 2021

Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Semantic Segmentation

274
1.85 stars / hour

PaddleViT

BR-IDL/PaddleViT NeurIPS 2021

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2. 0+

Image Classification Natural Language Inference +2

475
1.73 stars / hour

Resolution-robust Large Mask Inpainting with Fourier Convolutions

saic-mdal/lama 15 Sep 2021

We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function.

Image Inpainting LAMA

1,974
1.50 stars / hour

KML: Using Machine Learning to Improve Storage Systems

sbu-fsl/kernel-ml 22 Nov 2021

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput.

131
1.21 stars / hour

Attention Mechanisms in Computer Vision: A Survey

MenghaoGuo/Awesome-Vision-Attentions 15 Nov 2021

Humans can naturally and effectively find salient regions in complex scenes.

Image Classification Image Generation +4

815
1.20 stars / hour

Masked Autoencoders Are Scalable Vision Learners

pengzhiliang/MAE-pytorch 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Object Detection Self-Supervised Image Classification +2

925
1.10 stars / hour

Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

sunset1995/directvoxgo 22 Nov 2021

Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

Novel View Synthesis

127
1.02 stars / hour

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation

onion-liu/BlendGAN NeurIPS 2021

Specifically, we first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles.

Face Generation

288
1.00 stars / hour

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

SysCV/pcan NeurIPS 2021

We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.

Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation +1

57
0.92 stars / hour