NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

microsoft/nuwa 24 Nov 2021

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Text-to-Image Generation Video Generation +1

MetaFormer is Actually What You Need for Vision

sail-sg/poolformer 22 Nov 2021

Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Semantic Segmentation

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

SysCV/pcan NeurIPS 2021

We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.

Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation +1

Robust High-Resolution Video Matting with Temporal Guidance

PeterL1n/RobustVideoMatting 25 Aug 2021

We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.

Video Matting

OpenUE: An Open Toolkit of Universal Extraction from Text

zjunlp/openue EMNLP 2020

We introduce a prototype model and provide an open-source and extensible toolkit called OpenUE for various extraction tasks.

Event Extraction Intent Detection

Resolution-robust Large Mask Inpainting with Fourier Convolutions

saic-mdal/lama 15 Sep 2021

We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function.

Image Inpainting LAMA

KML: Using Machine Learning to Improve Storage Systems

sbu-fsl/kernel-ml 22 Nov 2021

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput.

BR-IDL/PaddleViT 15 Jun 2021

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2. 0+

Self-Supervised Image Classification Semantic Segmentation

PaddlePaddle/PaddleRec 6 Jul 2020

大规模推荐算法库,包含推荐系统经典及最新算法LR、Wide&Deep、DSSM、TDM、MIND、Word2Vec、DeepWalk、SSR、GRU4Rec、Youtube_dnn、NCF、GNN、FM、FFM、DeepFM、DCN、DIN、DIEN、DLRM、MMOE、PLE、ESMM、MAML、xDeepFM、DeepFEFM、NFM、AFM、RALM、Deep Crossing、PNN、BST、AutoInt、FGCNN、FLEN、ListWise等,包含经典推荐系统数据集criteo 、movielens等

Click-Through Rate Prediction

Masked Autoencoders Are Scalable Vision Learners

pengzhiliang/MAE-pytorch 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Object Detection Self-Supervised Image Classification +2

