On the Approximation Properties of Random ReLU Features

10 Oct 2018  ·  Yitong Sun, Anna Gilbert, Ambuj Tewari ·

We study the approximation properties of random ReLU features through their reproducing kernel Hilbert space (RKHS). We first prove a universality theorem for the RKHS induced by random features whose feature maps are of the form of nodes in neural networks. The universality result implies that the random ReLU features method is a universally consistent learning algorithm. We prove that despite the universality of the RKHS induced by the random ReLU features, composition of functions in it generates substantially more complicated functions that are harder to approximate than those functions simply in the RKHS. We also prove that such composite functions can be efficiently approximated by multi-layer ReLU networks with bounded weights. This depth separation result shows that the random ReLU features models suffer from the same weakness as that of shallow models. We show in experiments that the performance of random ReLU features is comparable to that of random Fourier features and, in general, has a lower computational cost. We also demonstrate that when the target function is the composite function as described in the depth separation theorem, 3-layer neural networks indeed outperform both random ReLU features and 2-layer neural networks.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods