On Learning Read-once DNFs With Neural Networks

1 Jan 2021 · Ido Bronstein, Alon Brutzkus, Amir Globerson ·

Learning functions over Boolean variables is a fundamental problem in machine learning. But not much is known about learning such functions by neural networks. Because learning these functions in the distribution free setting is NP-Hard, they are unlikely to be efficiently learnable by networks in this case. However, assuming the inputs are sampled from the uniform distribution, an important subset of functions that are known to be efficiently learnable is read-once DNFs. Here we focus on this setting where the functions are learned by a convex neural network and gradient descent. We first observe empirically that the learned neurons are aligned with the terms of the DNF, despite the fact that there are many zero-error networks that do not have this property. Thus, the learning process has a clear inductive bias towards such logical formulas. To gain a better theoretical understanding of this phenomenon we focus on minimizing the population risk. We show that this risk can be minimized by multiple networks: from ones that memorize data to ones that compactly represent the DNF. We then set out to understand why gradient descent ``"chooses" the compact representation. We use a computer assisted proof to prove the inductive bias for relatively small DNFs, and use it to design a process for reconstructing the DNF from the learned network. We then continue to provide theoretical insights on the learning process and the loss surface to better understand the resulting inductive bias. For example, we show that the neurons in solutions with minimum $l_2$-norm of the weights are also aligned with the terms of the DNF. Finally, we empirically show that our results are validated in the empirical case for high dimensional DNFs, more general network architectures and tabular datasets.

PDF Abstract