CPR reduces the semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.
Nevertheless, weakly supervised semantic segmentation methods are proficient in utilizing intra-class feature consistency to capture the boundary contours of the same semantic regions.
The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting.
In this study, we introduce the P2RBox network, which leverages point annotations and a mask generator to create mask proposals, followed by filtration through our Inspector Module and Constrainer Module.
However, the performance gap between point supervised object detection (PSOD) and bounding box supervised detection remains large.
While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role.