NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

microsoft/nuwa 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Text-to-Image Generation Video Generation +1

405
5.77 stars / hour

MetaFormer is Actually What You Need for Vision

sail-sg/poolformer 22 Nov 2021

Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Semantic Segmentation

263
2.48 stars / hour

KML: Using Machine Learning to Improve Storage Systems

sbu-fsl/kernel-ml 22 Nov 2021

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput.

118
2.02 stars / hour

PaddleViT

BR-IDL/PaddleViT ICCV 2021

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2. 0+

Classification General Classification +1

461
1.65 stars / hour

Resolution-robust Large Mask Inpainting with Fourier Convolutions

saic-mdal/lama 15 Sep 2021

We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function.

Image Inpainting LAMA

1,952
1.64 stars / hour

Attention Mechanisms in Computer Vision: A Survey

MenghaoGuo/Awesome-Vision-Attentions 15 Nov 2021

Humans can naturally and effectively find salient regions in complex scenes.

Image Classification Image Generation +4

759
1.57 stars / hour

Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

sunset1995/directvoxgo 22 Nov 2021

Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF's quality, yet it only takes about 15 minutes to train from scratch for a new scene.

Novel View Synthesis

119
1.36 stars / hour

Investigating Tradeoffs in Real-World Video Super-Resolution

ckkelvinchan/realbasicvsr 24 Nov 2021

The diversity and complexity of degradations in real-world video super-resolution (VSR) pose non-trivial challenges in inference and training.

Video Super-Resolution

51
1.21 stars / hour

Masked Autoencoders Are Scalable Vision Learners

pengzhiliang/MAE-pytorch 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Object Detection Self-Supervised Image Classification +2

901
1.16 stars / hour

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation

onion-liu/BlendGAN NeurIPS 2021

Specifically, we first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles.

Face Generation

279
1.15 stars / hour