Compelling ReLU Network Initialization and Training to Leverage Exponential Scaling with Depth

29 Nov 2023 · Max Milkert, David Hyde, Forrest Laine ·

A neural network with ReLU activations may be viewed as a composition of piecewise linear functions. For such networks, the number of distinct linear regions expressed over the input domain has the potential to scale exponentially with depth, but it is not expected to do so when the initial parameters are chosen randomly. This poor scaling can necessitate the use of overly large models to approximate even simple functions. To address this issue, we introduce a novel training strategy: we first reparameterize the network weights in a manner that forces an exponential number of activation patterns to manifest. Training first on these new parameters provides an initial solution that can later be refined by updating the underlying model weights. This approach allows us to produce function approximations that are several orders of magnitude better than their randomly initialized counterparts.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

BASE • ReLU

Edit Social Preview

Compelling ReLU Network Initialization and Training to Leverage Exponential Scaling with Depth

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove