Breast cancer (BC) has become the greatest threat to women’s health worldwide. Clinically, identification of axillary lymph node (ALN) metastasis and other tumor clinical characteristics such as ER, PR, and so on, are important for evaluating the prognosis and guiding the treatment for BC patients.

Several studies intended to predict the ALN status and other tumor clinical characteristics by clinicopathological data and genetic testing score. However, due to the relatively poor predictive values and high genetic testing costs, these methods are often limited. Recently, deep learning (DL) has enabled rapid advances in computational pathology, DL can perform high-throughput feature extraction on medical images and analyze the correlation between primary tumor features and above status. So far, there is no relevant research on preoperatively predicting ALN metastasis and other tumor clinical characteristics based on WSIs of primary BC samples.

Our paper has introduced a new dataset of Early Breast Cancer Core-Needle Biopsy WSI (BCNB), which includes core-needle biopsy whole slide images (WSIs) of early breast cancer patients and the corresponding clinical data. The WSIs have been examined and annotated by two independent and experienced pathologists blinded to all patient-related information.

Based on this dataset, we have studied the deep learning algorithm for predicting the metastatic status of ALN preoperatively by using multiple instance learning (MIL), and have achieved the best AUC of 0.831 in the independent test cohort. For more details, please review our paper.

There are WSIs of 1058 patients, and only part of tumor regions are annotated in WSIs. Except for the WSIs, we have also provided the clinical characteristics of each patient, which includes age, tumor size, tumor type, ER, PR, HER2, HER2 expression, histological grading, surgical, Ki67, molecular subtype, number of lymph node metastases, and the metastatic status of axillary lymph node (ALN). The dataset has been desensitized, and not contained the privacy information of patients.

Based on this dataset, we have studied the prediction of the metastatic status of axillary lymph node (ALN) in our paper, which is a weakly supervised classification task. However, other researches based on our dataset are also feasible, such as the prediction of histological grading, molecular subtype, HER2, ER, and PR. We do not limit the specific content for your research, and any research based on our dataset is welcome.

Please note that the dataset is only used for education and research, and the usage for commercial and clinical applications is not allowed. The usage of this dataset must follow the license.


Paper Code Results Date Stars

Dataset Loaders