We present a deep convolutional neural network for breast cancer screening
exam classification, trained and evaluated on over 200,000 exams (over
1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether
there is a cancer in the breast, when tested on the screening population. We
attribute the high accuracy of our model to a two-stage training procedure,
which allows us to use a very high-capacity patch-level network to learn from
pixel-level labels alongside a network learning from macroscopic breast-level
labels. To validate our model, we conducted a reader study with 14 readers,
each reading 720 screening mammogram exams, and find our model to be as
accurate as experienced radiologists when presented with the same data.
Finally, we show that a hybrid model, averaging probability of malignancy
predicted by a radiologist with a prediction of our neural network, is more
accurate than either of the two separately. To better understand our results,
we conduct a thorough analysis of our network's performance on different
subpopulations of the screening population, model design, training procedure,
errors, and properties of its internal representations.