On the Inductive Bias of a CNN for Distributions with Orthogonal Patterns

1 Jan 2021 · Alon Brutzkus, Amir Globerson ·

Training overparameterized convolutional neural networks with gradient based optimization is the most successful learning method for image classification. However, their generalization properties are far from understood. In this work, we consider a simplified image classification task where images contain orthogonal patches and are learned with a 3-layer overparameterized convolutional network and stochastic gradient descent (SGD). We empirically identify a novel phenomenon of SGD in our setting, where the dot-product between the learned pattern detectors and their detected patterns are governed by the pattern statistics in the training set. We call this phenomenon Pattern Statistics Inductive Bias (PSI) and empirically verify it in a large number of instances. We prove that in our setting, if a learning algorithm satisfies PSI then its sample complexity is $O(d^2\log(d))$ where $d$ is the filter dimension. In contrast, we show a VC dimension lower bound which is exponential in $d$. We perform experiments with overparameterized CNNs on a variant of MNIST with non-orthogonal patches, and show that the empirical observations are in line with our analysis.

PDF Abstract