Omnivore: A Single Model for Many Visual Modalities

facebookresearch/omnivore 20 Jan 2022

Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.

Action Classification Action Recognition +3

128
1.11 stars / hour

Stitch it in Time: GAN-Based Facial Editing of Real Videos

rotemtzaban/STIT 20 Jan 2022

The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing.

Facial Editing

106
0.72 stars / hour
73
0.72 stars / hour

A ConvNet for the 2020s

facebookresearch/ConvNeXt 10 Jan 2022

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

 Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

2,654
0.63 stars / hour

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

nvlabs/instant-ngp 16 Jan 2022

Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate.

Neural Radiance Caching

2,088
0.63 stars / hour

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

hpcaitech/colossalai 28 Oct 2021

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.

618
0.62 stars / hour

Deep Learning for Identifying Metastatic Breast Cancer

3dimaging/DeepLearningCamelyon 18 Jun 2016

The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies.

General Classification Image Classification +1

62
0.61 stars / hour

General-Purpose Question-Answering with Macaw

allenai/macaw 6 Sep 2021

Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available.

Generative Question Answering

227
0.46 stars / hour

Explaining in Style: Training a GAN to explain a classifier in StyleSpace

google/explaining-in-style ICCV 2021

A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image.

Image Classification

114
0.39 stars / hour

CoAtNet: Marrying Convolution and Attention for All Data Sizes

xmu-xiaoma666/External-Attention-pytorch NeurIPS 2021

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.

 Ranked #1 on Image Classification on ImageNet (using extra training data)

Image Classification

3,790
0.31 stars / hour