Data Labeling Impact on Deep Learning Models in Digital Pathology: a Breast Cancer Case Study
Image data labeling is a vital step for deep learning model training. Studies on data labeling have not considered its impact on model performance and only focused on problems such as the curse of big data labeling or labeling tools. Furthermore, it seems clear that errors in labeling have a significant impact and should be fixed. However, in the medical domain, it is hard to ensure proper data labeling. In general, trained engineers are asked to annotate histology images, which causes errors in labeling. The aim of this study is to highlight the impact of data labeling on deep learning models. For that purpose, deep learning models are trained on two different annotations with different levels of expertise. Results show the importance of including expertise in deep learning model development. The impact of data labeling is shown through a case study on the proliferation of biomarker Ki-67 labeling index scoring.
PDF Abstract