A ConvNet for the 2020s

facebookresearch/ConvNeXt 10 Jan 2022

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

 Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

1,994
5.70 stars / hour

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

BoltzmannEntropy/interviews.ai 30 Dec 2021

The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI.

2,943
2.39 stars / hour

Masked Autoencoders Are Scalable Vision Learners

facebookresearch/mae 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Domain Generalization Object Detection +3

1,850
1.46 stars / hour

Detecting Twenty-thousand Classes using Image-level Supervision

facebookresearch/Detic 7 Jan 2022

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without fine-tuning.

Image Classification

557
1.25 stars / hour

Plenoxels: Radiance Fields without Neural Networks

sxyu/svox2 9 Dec 2021

We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.

1,353
1.08 stars / hour

Extracting Triangular 3D Models, Materials, and Lighting From Images

nvlabs/tiny-cuda-nn 24 Nov 2021

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.

451
0.71 stars / hour

Layered Neural Atlases for Consistent Video Editing

ykasten/layered-neural-atlases 23 Sep 2021

We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video.

Style Transfer Video Editing +2

183
0.70 stars / hour

Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning

sense-x/uniformer 12 Jan 2022

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

42
0.67 stars / hour

Language-driven Semantic Segmentation

isl-org/lang-seg 10 Jan 2022

We present LSeg, a novel model for language-driven semantic image segmentation.

Few-Shot Semantic Segmentation Semantic Segmentation +1

118
0.57 stars / hour

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

xmu-xiaoma666/External-Attention-pytorch 5 Oct 2021

In this paper, we ask the following question: is it possible to combine the strengths of CNNs and ViTs to build a light-weight and low latency network for mobile vision tasks?

Image Classification Object Detection

3,548
0.52 stars / hour