Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods.
Ranked #1 on
Self-Supervised Image Classification
on ImageNet (finetuned)
(using extra training data)
SELF-SUPERVISED IMAGE CLASSIFICATION SELF-SUPERVISED LEARNING SEMI-SUPERVISED IMAGE CLASSIFICATION
The discriminator of ContraGAN discriminates the authenticity of given samples and minimizes a contrastive objective to learn the relations between training images.
Ranked #6 on
Conditional Image Generation
on CIFAR-10
(FID metric)
We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling.
Ranked #1 on
Image Generation
on LSUN Bedroom 256 x 256
(FID-10k-training-steps metric)
First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing).
Ranked #1 on
Text-Image Retrieval
on WIT
Humans have a natural instinct to identify unknown object instances in their environments.
Ranked #1 on
Open World Object Detection
on PASCAL VOC 2007
Building open-domain chatbots is a challenging area for machine learning research.
Monster Mash is a new sketch-based modeling and animation tool that allows you to quickly sketch a character, inflate it into 3D, and promptly animate it.
By stacking the TNT blocks, we build the TNT model for image recognition.
Ranked #5 on
Image Classification
on CIFAR-10
Recently, image-to-image translation has made significant progress in achieving both multi-label (\ie, translation conditioned on different labels) and multi-style (\ie, generation with diverse styles) tasks.
Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.
Ranked #31 on
Object Detection
on COCO minival
IMAGE CLASSIFICATION INSTANCE SEGMENTATION OBJECT DETECTION SEMANTIC SEGMENTATION