1 code implementation • Preprints 2023 • Evgenii Pishchik
We assume that the activation functions with trainable parameters can outperform functions without ones, because the trainable parameters allow the model to "select'' the type of each of the activation functions itself, however, this strongly depends on the architecture of the deep neural network and the activation function itself.
Ranked #78 on Image Classification on MNIST