Superseding Model Scaling by Penalizing Dead Units and Points with Separation Constraints

ICLR 2020 Anonymous

In this article, we study a proposal that enables to train extremely thin (4 or 8 neurons per layer) and relatively deep (more than 100 layers) feedforward networks without resorting to any architectural modification such as Residual or Dense connections, data normalization or model scaling. We accomplish that by alleviating two problems... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.