Fewer is More: Image Segmentation Based Weakly Supervised Object Detection with Partial Aggregation
We consider addressing the major failures in weakly supervised object detectors. As most weakly supervised object detection methods are based on pre-generated proposals, they often show two false detections: (i) group multiple object instances with one bounding box, and (ii) focus on only parts rather than the whole objects. We propose an image segmentation framework to help correctly detect individual instances. The input images are first segmented into several sub-images based on the proposal overlaps to uncouple the grouping objects. Then the batch of sub-images are fed into the convolutional network to train an object detector. Within each sub-image, a partial aggregation strategy is adopted to dynamically select a portion of the proposal-level scores to produce the sub-image-level output. This regularizes the model to learn context knowledge about the object content. Finally, the outputs of the sub-images are pooled together as the model prediction. The ideas are implemented with VGG-D backbone to be comparable with recent state-of-the-art weakly supervised methods. Extensive experiments on PASCAL VOC datasets show the superiority of our design. The proposed model outperforms other alternatives on detection, localization, and classification tasks.
PDFDatasets
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Weakly Supervised Object Detection | PASCAL VOC 2007 | SegAgg | MAP | 43.9 | # 31 |