Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms

Mammographic mass detection is an integral part of a computer-aided diagnosis system. Annotating a large number of mammograms at pixel-level in order to train a mass detection model in a fully supervised fashion is costly and time-consuming. This paper presents a novel self-training framework for semi-supervised mass detection with soft image-level labels generated from diagnosis reports by Mammo-RoBERTa, a RoBERTa-based natural language processing model fine-tuned on the fully labeled data and associated mammography reports. Starting with a fully supervised model trained on the data with pixel-level masks, the proposed framework iteratively refines the model itself using the entire weakly labeled data (image-level soft label) in a self-training fashion. A novel sample selection strategy is proposed to identify those most informative samples for each iteration, based on the current model output and the soft labels of the weakly labeled data. A soft cross-entropy loss and a soft focal loss are also designed to serve as the image-level and pixel-level classification loss respectively. Our experiment results show that the proposed semi-supervised framework can improve the mass detection accuracy on top of the supervised baseline, and outperforms the previous state-of-the-art semi-supervised approaches with weakly labeled data, in some cases by a large margin.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.