no code implementations • 15 Sep 2023 • Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh
A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion.
no code implementations • 15 Sep 2022 • Tomas Hrycej, Bernhard Bermeitinger, Siegfried Handschuh
Determining an appropriate number of attention heads on one hand and the number of transformer-encoders, on the other hand, is an important choice for Computer Vision (CV) tasks using the Transformer architecture.
no code implementations • 15 Sep 2022 • Tomas Hrycej, Bernhard Bermeitinger, Siegfried Handschuh
For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance.
no code implementations • 19 Jul 2019 • Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh
This does not directly contradict the theoretical findings---it is possible that the superior representational capacity of deep networks is genuine while finding the mean square minimum of such deep networks is a substantially harder problem than with shallow ones.
no code implementations • 27 Jun 2019 • Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh
Singular Value Decomposition (SVD) constitutes a bridge between the linear algebra concepts and multi-layer neural networks---it is their linear analogy.