We propose a new notion of `non-linearity' of a network layer with respect to
an input batch that is based on its proximity to a linear system, which is
reflected in the non-negative rank of the activation matrix. We measure this
non-linearity by applying non-negative factorization to the activation matrix.
Considering batches of similar samples, we find that high non-linearity in deep
layers is indicative of memorization. Furthermore, by applying our approach
layer-by-layer, we find that the mechanism for memorization consists of
distinct phases. We perform experiments on fully-connected and convolutional
neural networks trained on several image and audio datasets. Our results
demonstrate that as an indicator for memorization, our technique can be used to
perform early stopping.