NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

microsoft/nuwa 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Text-to-Image Generation Video Generation +1

959
4.49 stars / hour

MetaFormer is Actually What You Need for Vision

sail-sg/poolformer 22 Nov 2021

Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Semantic Segmentation

314
1.32 stars / hour

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

SysCV/pcan NeurIPS 2021

We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.

Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation +1

114
1.03 stars / hour

Robust High-Resolution Video Matting with Temporal Guidance

PeterL1n/RobustVideoMatting 25 Aug 2021

We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.

Video Matting

4,771
1.00 stars / hour

Resolution-robust Large Mask Inpainting with Fourier Convolutions

saic-mdal/lama 15 Sep 2021

We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function.

Image Inpainting LAMA

2,019
0.89 stars / hour

PaddleViT

BR-IDL/PaddleViT 5 Oct 2021

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2. 0+

Image Classification Object Detection

524
0.80 stars / hour

KML: Using Machine Learning to Improve Storage Systems

sbu-fsl/kernel-ml 22 Nov 2021

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput.

141
0.79 stars / hour

Attention Mechanisms in Computer Vision: A Survey

MenghaoGuo/Awesome-Vision-Attentions 15 Nov 2021

Humans can naturally and effectively find salient regions in complex scenes.

Image Classification Image Generation +4

868
0.76 stars / hour

Masked Autoencoders Are Scalable Vision Learners

pengzhiliang/MAE-pytorch 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Object Detection Self-Supervised Image Classification +2

993
0.74 stars / hour

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

PaddlePaddle/PaddleDetection 1 Nov 2021

We investigate the applicability of the anchor-free strategy on lightweight object detection models.

Object Detection

5,575
0.70 stars / hour