# Diffusion Models Beat GANs on Image Synthesis

11 May 2021

Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3. 85 on ImageNet 512$\times$512.

398
1.08 stars / hour

# PAFNet: An Efficient Anchor-Free Object Detector Guidance

28 Apr 2021

Therefore, a trade-off between effectiveness and efficiency is necessary in practical scenarios.

Ranked #56 on Object Detection on COCO test-dev (APL metric)

3,884
0.81 stars / hour

13 May 2021

In this paper, we explore enabling user editing of a category-level NeRF - also known as a conditional radiance field - trained on a shape category.

82
0.76 stars / hour

# Delving into Deep Imbalanced Regression

18 Feb 2021

We define Deep Imbalanced Regression (DIR) as learning from such imbalanced data with continuous targets, dealing with potential missing data for certain target values, and generalizing to the entire target range.

114
0.52 stars / hour

12 Apr 2021

In this paper, we explore the open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data.

1,037
0.45 stars / hour

# FNet: Mixing Tokens with Fourier Transforms

9 May 2021

We show that Transformer encoder architectures can be massively sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens.

Ranked #1 on Paraphrase Identification on Quora Question Pairs (F1-Accuracy Mean metric)

61
0.45 stars / hour

# Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

12 May 2021

In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis.

63
0.44 stars / hour

# Emerging Properties in Self-Supervised Vision Transformers

29 Apr 2021

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).

2,057
0.43 stars / hour

# DeepV2D: Video to Depth with Differentiable Structure from Motion

We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video.

370
0.42 stars / hour

# Twins: Revisiting the Design of Spatial Attention in Vision Transformers

28 Apr 2021

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.

147
0.42 stars / hour