K-means for unsupervised instance segmentation using a self-supervised transformer
Instance segmentation is a fundamental task in computer vision that assigns every pixel to an appropriate class and localizes objects into bounding boxes. However, collecting pixel-level segmentation labels is more resource- and time-consuming than collecting classification and detection labels. Herein, we present a novel approach, iterative mask refinement using a self-supervised transformer (IMST), which performs class agnostic unsupervised instance segmentation using simple K-means clustering and a self-supervised vision transformer. IMST generates pseudo-ground-truth labels that can be used to train an off-the-shelf instance segmentation model. The pseudo labels demonstrate improved performance on multiple datasets. The instance segmentation model trained on the pseudo labels outperforms state-of-the-art unsupervised instance segmentation methods on COCO20k (+4.0 average precision (AP)) and COCO val2017(+2.6 AP) without modifications to the training loss or architecture. We demonstrate that our method can be extended to tasks such as single/multiple object discovery and supervised fine-tuning instance segmentation while outperforming previous methods.
PDF AbstractDatasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Single-object discovery | COCO_20k | IMST | CorLoc | 72.2 | # 1 |