A ConvNet for the 2020s

facebookresearch/ConvNeXt 10 Jan 2022

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

 Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

2,075
4.69 stars / hour

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

BoltzmannEntropy/interviews.ai 30 Dec 2021

The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI.

2,969
1.90 stars / hour

Masked Autoencoders Are Scalable Vision Learners

facebookresearch/mae 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Domain Generalization Object Detection +3

1,976
1.31 stars / hour

Detecting Twenty-thousand Classes using Image-level Supervision

facebookresearch/Detic 7 Jan 2022

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without fine-tuning.

Image Classification

601
1.13 stars / hour

Extracting Triangular 3D Models, Materials, and Lighting From Images

nvlabs/tiny-cuda-nn 24 Nov 2021

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations.

451
0.82 stars / hour

Plenoxels: Radiance Fields without Neural Networks

sxyu/svox2 9 Dec 2021

We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.

1,359
0.76 stars / hour

Layered Neural Atlases for Consistent Video Editing

ykasten/layered-neural-atlases 23 Sep 2021

We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video.

Style Transfer Video Editing +2

187
0.73 stars / hour

Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning

sense-x/uniformer 12 Jan 2022

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

42
0.60 stars / hour

Language-driven Semantic Segmentation

isl-org/lang-seg 10 Jan 2022

We present LSeg, a novel model for language-driven semantic image segmentation.

Few-Shot Semantic Segmentation Semantic Segmentation +1

121
0.59 stars / hour

CoAtNet: Marrying Convolution and Attention for All Data Sizes

xmu-xiaoma666/External-Attention-pytorch NeurIPS 2021

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

 Ranked #1 on Image Classification on ImageNet (using extra training data)

Image Classification

3,574
0.47 stars / hour