Therefore, a trade-off between effectiveness and efficiency is necessary in practical scenarios.
Ranked #56 on Object Detection on COCO test-dev (APL metric)
In this paper, we explore the open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data.
Ranked #1 on Sketch-to-Image Translation on Scribble
We show that Transformer encoder architectures can be massively sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens.
Ranked #1 on Paraphrase Identification on Quora Question Pairs (F1-Accuracy Mean metric)
In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis.
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).
Ranked #1 on Copy Detection on Copydays strong subset
Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.
Ranked #2 on Semantic Segmentation on ADE20K val