Despite the progress of deep learning in medical image segmentation, standard CNNs are still not fully adopted in clinical settings as they lack robustness and interpretability.
In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
Pairing stage calculates the error per sample, sorts the samples and pairs with strategy: hardest with easiest one, than mixing stage merges two samples using mixup, $x_1 + (1-\lambda)x_2$.
We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view.
The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively.
Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation.