Transformers in Medical Imaging: A Survey

fahadshamshad/awesome-transformers-in-medical-imaging 24 Jan 2022

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as {de facto} operators.

Image Classification Medical Image Denoising +4

113
1.94 stars / hour

Stitch it in Time: GAN-Based Facial Editing of Real Videos

rotemtzaban/STIT 20 Jan 2022

The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing.

Facial Editing

177
0.86 stars / hour

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

allenai/natural-instructions 18 Apr 2021

Humans (e. g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples.

Question Answering

99
0.77 stars / hour

General-Purpose Question-Answering with Macaw

allenai/macaw 6 Sep 2021

Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available.

Generative Question Answering

305
0.75 stars / hour

Omnivore: A Single Model for Many Visual Modalities

facebookresearch/omnivore 20 Jan 2022

Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.

Action Classification Action Recognition +3

158
0.68 stars / hour

A ConvNet for the 2020s

facebookresearch/ConvNeXt 10 Jan 2022

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model.

 Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)

Domain Generalization Image Classification +2

2,761
0.66 stars / hour

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

sense-x/uniformer 24 Jan 2022

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

Object Detection Pose Estimation +3

152
0.63 stars / hour

Masked Autoencoders Are Scalable Vision Learners

facebookresearch/mae 11 Nov 2021

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Domain Generalization Object Detection +3

2,408
0.47 stars / hour

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

snap-stanford/greaselm 21 Jan 2022

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it.

Knowledge Graphs Question Answering

30
0.46 stars / hour

Patches Are All You Need?

locuslab/convmixer 24 Jan 2022

Despite its simplicity, we show that the ConvMixer outperforms the ViT, MLP-Mixer, and some of their variants for similar parameter counts and data set sizes, in addition to outperforming classical vision models such as the ResNet.

Image Classification

765
0.46 stars / hour