Unsupervised Semantic Segmentation with Language-image Pre-training
5 papers with code • 10 benchmarks • 7 datasets
A segmentation task which does not utilise any human-level supervision for semantic segmentation except for a backbone which is initialised with features pre-trained with image-level labels.
Most implemented papers
GroupViT: Semantic Segmentation Emerges from Text Supervision
With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.
ReCo: Retrieve and Co-segment for Zero-shot Transfer
Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment.
Extract Free Dense Labels from CLIP
Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.
Perceptual Grouping in Contrastive Vision-Language Models
In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task.