Transformers in Medical Imaging: A Survey

fahadshamshad/awesome-transformers-in-medical-imaging 24 Jan 2022

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators.

Image Classification Medical Image Denoising +4

146
1.73 stars / hour

Stitch it in Time: GAN-Based Facial Editing of Real Videos

rotemtzaban/STIT 20 Jan 2022

The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing.

Facial Editing

194
0.85 stars / hour

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

allenai/natural-instructions 18 Apr 2021

Humans (e. g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples.

Question Answering

99
0.77 stars / hour

A ConvNet for the 2020s

facebookresearch/ConvNeXt 10 Jan 2022

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

 Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

2,797
0.70 stars / hour

General-Purpose Question-Answering with Macaw

allenai/macaw 6 Sep 2021

Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available.

Generative Question Answering

316
0.69 stars / hour

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

sense-x/uniformer 24 Jan 2022

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

Object Detection Pose Estimation +3

163
0.69 stars / hour

Omnivore: A Single Model for Many Visual Modalities

facebookresearch/omnivore 20 Jan 2022

Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.

Action Classification Action Recognition +3

173
0.61 stars / hour

Masked Autoencoders Are Scalable Vision Learners

facebookresearch/mae 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Domain Generalization Object Detection +3

2,482
0.50 stars / hour

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

cvg/pixel-perfect-sfm ICCV 2021

Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction.

3D Reconstruction

493
0.42 stars / hour

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

snap-stanford/greaselm 21 Jan 2022

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.

Knowledge Graphs Question Answering

32
0.40 stars / hour