A ConvNet for the 2020s

facebookresearch/ConvNeXt 10 Jan 2022

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

 Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

2,181
4.27 stars / hour

Masked Autoencoders Are Scalable Vision Learners

facebookresearch/mae 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Domain Generalization Object Detection +3

2,041
1.74 stars / hour

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

BoltzmannEntropy/interviews.ai 30 Dec 2021

The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI.

2,969
1.48 stars / hour

Detecting Twenty-thousand Classes using Image-level Supervision

facebookresearch/Detic 7 Jan 2022

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without fine-tuning.

Image Classification

622
1.13 stars / hour

Extracting Triangular 3D Models, Materials, and Lighting From Images

nvlabs/tiny-cuda-nn 24 Nov 2021

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.

475
0.97 stars / hour

Layered Neural Atlases for Consistent Video Editing

ykasten/layered-neural-atlases 23 Sep 2021

We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video.

Style Transfer Video Editing +2

187
0.76 stars / hour

Plenoxels: Radiance Fields without Neural Networks

sxyu/svox2 9 Dec 2021

We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.

1,359
0.54 stars / hour

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

xmu-xiaoma666/External-Attention-pytorch 5 Oct 2021

In this paper, we ask the following question: is it possible to combine the strengths of CNNs and ViTs to build a light-weight and low latency network for mobile vision tasks?

Image Classification Object Detection

3,580
0.52 stars / hour

CoAtNet: Marrying Convolution and Attention for All Data Sizes

xmu-xiaoma666/External-Attention-pytorch NeurIPS 2021

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

 Ranked #1 on Image Classification on ImageNet (using extra training data)

Image Classification

3,574
0.51 stars / hour

Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning

sense-x/uniformer 12 Jan 2022

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

43
0.50 stars / hour